Skip to content

Instantly share code, notes, and snippets.

@avyfain
Created June 1, 2018 03:59
Show Gist options
  • Save avyfain/f863b3eaf1f66c0f998a5f172cbef473 to your computer and use it in GitHub Desktop.
Save avyfain/f863b3eaf1f66c0f998a5f172cbef473 to your computer and use it in GitHub Desktop.
library(XML)
library(ggplot2)
df <- readHTMLTable("http://projects.dailycal.org/paychecker")[[1]]
colnames(df)[4] <- "Salary"
df$Salary <- as.numeric(gsub('[$,]', '', df$Salary))
p <- ggplot(df, aes(x=Department, y=Salary)) + coord_flip()
p + geom_boxplot(aes(color=Rank,
x=reorder(Department, Salary, FUN=max))) +
scale_y_continuous(labels = scales::dollar) +
labs(title="Salaries by Department",
subtitle="University of California System",
y="Annual Salary (2015)",
x="Department",
caption="Source: http://projects.dailycal.org/paychecker/\n by @avyfain, inspired by @johnjhorton") +
theme(plot.caption = element_text(size=7.5))
@avyfain
Copy link
Author

avyfain commented Jul 20, 2020

That's great @hohonuuli! I did not know about plotnine but thanks for testing it out in Python, that's actually my usual language, but but I couldn't figure out how to do the grouped boxplot properly from pandas, so I went back to R 🙃

FYI, in newer versions of Pandas you can read the url directly into a df: https://stackoverflow.com/questions/32400867/pandas-read-csv-from-url/41880513#41880513

@hohonuuli
Copy link

@avyfain Hey, glad it's actually useful for you. BTW, I tried reading the URL directly with df = pd.read_html("http://projects.dailycal.org/paychecker") but it would return a 403 error. I didn't dig into the why though; I just fell back to using requests instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment