Last active
December 3, 2018 04:36
-
-
Save glamp/6268674 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#figure out which columns are numeirc (and hence we can look at the distribution) | |
numeric_cols <- sapply(df, is.numeric) | |
#turn the data into long format (key->value esque) | |
df.lng <- melt(df[,numeric_cols], id="is_bad") | |
head(df.lng) | |
#plot the distribution for bads and goods for each variable | |
p <- ggplot(aes(x=value, group=is_bad, colour=factor(is_bad)), data=df.lng) | |
#quick and dirty way to figure out if you have any good variables | |
p + geom_density() + | |
facet_wrap(~variable, scales="free") | |
#NOTES: | |
# - be careful of using variables that get created AFTER a loan is issued (prinicpal/interest related) | |
# - any ID variables that are numeric will be plotted as well. be sure to ignore those as well. |
+1 the whole blog post has been one of the most approachable and accessible tutorials on R and modeling that I've read.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
glad you like it!