Skip to content

Instantly share code, notes, and snippets.

@ottlngr
Created July 5, 2016 19:44
Show Gist options
  • Save ottlngr/c8cac68997abe8875ce4f4d5fdc7d996 to your computer and use it in GitHub Desktop.
Save ottlngr/c8cac68997abe8875ce4f4d5fdc7d996 to your computer and use it in GitHub Desktop.
Make a wordcloud straight from a .tex file using R
library(stringr)
library(tm)
library(wordcloud)
library(SnowballC)
file <- "./file.tex"
txt <- readChar(file, file.info(file)$size)
txt <- str_replace_all(txt, "\\n", " ")
txt <- str_replace_all(txt, "\\\\documentclass.*\\}.*\\\\begin\\{document\\}", "")
txt <- str_replace_all(txt, "\\\\[a-zA-Z]*\\s", "")
txt <- str_replace_all(txt, "\\\\[a-zA-Z]*\\{.*?\\}", "")
txt <- str_replace_all(txt, "\\{.*?\\}", "")
txt <- str_replace_all(txt, "\\s[^a-zA-Z0-9]*\\s", "")
txt <- str_replace_all(txt, "[^a-zA-Z\\s]", "")
cor <- Corpus(VectorSource(txt))
cor <- tm_map(cor, stripWhitespace)
cor <- tm_map(cor, removeWords, stopwords("english"))
cor <- tm_map(cor, tolower)
cor <- tm_map(cor, stemDocument)
cor <- tm_map(cor, PlainTextDocument)
wordcloud(cor)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment