Last active
March 27, 2019 20:18
-
-
Save linse/8e0f79e6b4324382459ca19ce1dac0ee to your computer and use it in GitHub Desktop.
Scrapism
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cat moby_dick.txt | tr -d '\r“’‘”' | tr "\n" " " | tr -s ' ' | less | |
cat moby_dick.txt | tr -d '\r“’‘”' | tr "\n" " " | tr -s ' ' | sed 's/e/aaaaa/g' | less | |
# one sentence per line | |
cat moby_dick.txt | tr -d '\r“’‘”_' | tr "\n" " " | tr "—" " " | tr "-" " " | tr -s ' ' | sed 's/\./.\ | |
/g' | sed 's/\!/!\ | |
/g' | sed 's/\?/?\ | |
/g' | less | sort | uniq -c | sort -n | |
# one word per line | |
cat Women-Who-Run-with-the-Wolves.txt | tr -d '\r“’‘”"_' | tr "\n" " " | tr "—" " " | tr "-" " " | tr -s ' ' | sed 's/ / \ | |
/g' | sort | uniq -c | sort -nr | less | |
# append instead of less to pick some lines at random | |
sort --random-sort | head -n 5 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment