Skip to content

Instantly share code, notes, and snippets.

@krisgesling
Created January 24, 2022 01:57
Show Gist options
  • Save krisgesling/f3ee62961d21d2e3407b7b65fd5b6ff0 to your computer and use it in GitHub Desktop.
Save krisgesling/f3ee62961d21d2e3407b7b65fd5b6ff0 to your computer and use it in GitHub Desktop.
Simple script to count word frequency and line length for a given set of text files.
#!/bin/bash
cd $1
# CONFIG VARIABLES
ignoreChars=',.?!'
for filename in ./*.txt; do
outputFile=$(basename $filename .txt)'_'"$(date '+%Y-%m-%d-%H:%M:%S')"'.txt'
touch $outputFile
echo '****************************' >> $outputFile
echo '*** Gloss word frequency ***' >> $outputFile
echo '****************************' >> $outputFile
echo ' ' >> $outputFile
cat $filename | tr '/' ' ' | sed 's|['$ignoreChars']||gI' | sed 's|\s\s|\s|g' | sed 's|\s$||g' | tr ' ' '\n' | sort | uniq -ic | sort -nr >> $outputFile
echo ' ' >> $outputFile
echo ' ' >> $outputFile
echo '**********************' >> $outputFile
echo '*** Words per line ***' >> $outputFile
echo '**********************' >> $outputFile
echo ' ' >> $outputFile
lineNum=1
cat $filename | while read line;do
echo $line | wc -w >> $outputFile
((lineNum++))
done
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment