Skip to content

Instantly share code, notes, and snippets.

@jeroos
Created October 30, 2024 04:09
Show Gist options
  • Save jeroos/2ae923336863140a2519515615773e65 to your computer and use it in GitHub Desktop.
Save jeroos/2ae923336863140a2519515615773e65 to your computer and use it in GitHub Desktop.

Task 3: Text File Analyzer

Objective

Create a comprehensive text analysis tool that processes files, generates detailed statistics, and produces formatted reports.

Technical Requirements

1. Data Structures

class TextAnalysis {
    public int TotalCharacters { get; set; }
    public int AlphabeticCharacters { get; set; }
    public int NumericCharacters { get; set; }
    public int Paragraphs { get; set; }
    public int Sentences { get; set; }
    public int Words { get; set; }
    public Dictionary<string, int> WordFrequency { get; set; }
    public double AverageWordLength { get; set; }
    public double AverageSentenceLength { get; set; }
}

class AnalysisReport {
    public string FileName { get; set; }
    public DateTime AnalysisDate { get; set; }
    public TextAnalysis Statistics { get; set; }
    public TimeSpan ProcessingTime { get; set; }
}

2. Required Functionality

a) File Operations

  • File selection:
    • Accept file path input
    • Support drag-and-drop
    • Verify file existence
    • Check file extension (.txt only)
    • Handle file access permissions
  • File reading:
    • Support large files (>1MB)
    • Show progress for large files
    • Handle encoding issues
    • Support different line endings

b) Text Analysis

  • Character Analysis:
    • Total characters (with/without spaces)
    • Alphabetic characters
    • Numeric characters
    • Special characters
    • Case distribution (upper/lower)
  • Word Analysis:
    • Word count
    • Unique words
    • Word frequency
    • Average word length
    • Longest/shortest words
  • Sentence Analysis:
    • Sentence count
    • Average sentence length
    • Longest/shortest sentences
  • Paragraph Analysis:
    • Paragraph count
    • Average paragraph length
    • Blank line handling

c) Report Generation

  • Create formatted report with:
    • File information
    • Basic statistics
    • Detailed analysis
    • Word frequency table
    • Time stamp
  • Support multiple formats:
    • Console display
    • Text file
    • Simple HTML

d) Performance Features

  • Progress indication
  • Cancel operation option
  • Processing time tracking
  • Memory usage optimization

3. User Interface Requirements

=== Text File Analyzer ===
1. Select File
2. Analyze File
3. Generate Report
4. View Previous Analysis
5. Exit

Enter your choice (1-5): 1

Enter file path or drag file here: C:\sample.txt
File selected: sample.txt (size: 45.3 KB)

1. Start Analysis
2. Select Different File
3. Return to Main Menu

Choice: 1

Analyzing file...
[====================] 100%
Analysis complete in 1.23 seconds

Basic Statistics:
- Characters: 45,678
- Words: 7,890
- Sentences: 423
- Paragraphs: 89

Generate detailed report? (Y/N):

4. Error Handling

  • File errors:
    • File not found
    • Permission denied
    • File too large
    • Invalid format
  • Processing errors:
    • Out of memory
    • Encoding issues
    • Corrupted content
  • Report generation errors

Sample Report Output

=== Text Analysis Report ===
File: sample.txt
Date: 10/29/2024 14:30:45
Size: 45.3 KB
Processing Time: 1.23 seconds

1. Character Statistics
   - Total Characters: 45,678
   - Alphabetic: 38,901 (85.2%)
   - Numeric: 3,456 (7.6%)
   - Special: 3,321 (7.2%)
   - Uppercase: 4,567 (10%)
   - Lowercase: 34,334 (75.2%)

2. Word Statistics
   - Total Words: 7,890
   - Unique Words: 1,234
   - Average Length: 5.2 characters
   - Longest Word: "extraordinary" (13 chars)
   - Shortest Word: "a" (1 char)

3. Sentence Statistics
   - Total Sentences: 423
   - Average Length: 18.6 words
   - Longest: 35 words
   - Shortest: 3 words

4. Top 10 Words (excluding articles)
   1. "sample" (145 times)
   2. "text" (89 times)
   [...]

Evaluation Criteria

1. Code Architecture (30%)

  • Class design and organization
  • Method modularity
  • Algorithm efficiency
  • Memory management
  • Code documentation

2. Analysis Implementation (25%)

  • Accuracy of calculations
  • Processing efficiency
  • Memory usage
  • Large file handling
  • Algorithm correctness

3. Error Management (25%)

  • File handling errors
  • Processing errors
  • Resource management
  • User input validation
  • Recovery procedures

4. Output Quality (20%)

  • Report formatting
  • Data presentation
  • Statistical accuracy
  • Performance metrics
  • User feedback

Test Scenarios

  1. Basic Functionality

    • Small text file (<10KB)
    • Normal English text
    • Standard formatting
  2. Edge Cases

    • Empty file
    • Single character file
    • Very large file (>10MB)
    • No paragraphs/sentences
  3. Error Conditions

    • File doesn't exist
    • Permission denied
    • Invalid format
    • Corrupted content
  4. Special Content

    • Mixed languages
    • Special characters
    • Different encodings
    • Various line endings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment