Skip to content

Instantly share code, notes, and snippets.

@gu-mi
Forked from glamp/randomforest_example.R
Created February 18, 2013 21:10

Revisions

  1. @glamp glamp revised this gist Feb 10, 2013. 1 changed file with 1 addition and 10 deletions.
    11 changes: 1 addition & 10 deletions randomforest_example.R
    Original file line number Diff line number Diff line change
    @@ -3,16 +3,7 @@ library(randomForest)
    # download Titanic Survivors data
    data <- read.table("http://math.ucdenver.edu/RTutorial/titanic.txt", h=T, sep="\t")
    # make survived into a yes/no
    data$Survived <- as.factor(ifelse(data$Survived==1, "yes", "no"))
    summary(data)
    # Name PClass Age Sex Survived
    # Carlsson, Mr Frans Olof : 2 1st:322 Min. : 0.17 female:462 no :863
    # Connolly, Miss Kate : 2 2nd:280 1st Qu.:21.00 male :851 yes:450
    # Kelly, Mr James : 2 3rd:711 Median :28.00
    # Abbing, Mr Anthony : 1 Mean :30.40
    # Abbott, Master Eugene Joseph: 1 3rd Qu.:39.00
    # Abbott, Mr Rossmore Edward : 1 Max. :71.00
    # (Other) :1304 NA's :557
    data$Survived <- as.factor(ifelse(data$Survived==1, "yes", "no"))

    # split into a training and test set
    idx <- runif(nrow(data)) <= .75
  2. @glamp glamp created this gist Feb 10, 2013.
    40 changes: 40 additions & 0 deletions randomforest_example.R
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,40 @@
    library(randomForest)

    # download Titanic Survivors data
    data <- read.table("http://math.ucdenver.edu/RTutorial/titanic.txt", h=T, sep="\t")
    # make survived into a yes/no
    data$Survived <- as.factor(ifelse(data$Survived==1, "yes", "no"))
    summary(data)
    # Name PClass Age Sex Survived
    # Carlsson, Mr Frans Olof : 2 1st:322 Min. : 0.17 female:462 no :863
    # Connolly, Miss Kate : 2 2nd:280 1st Qu.:21.00 male :851 yes:450
    # Kelly, Mr James : 2 3rd:711 Median :28.00
    # Abbing, Mr Anthony : 1 Mean :30.40
    # Abbott, Master Eugene Joseph: 1 3rd Qu.:39.00
    # Abbott, Mr Rossmore Edward : 1 Max. :71.00
    # (Other) :1304 NA's :557

    # split into a training and test set
    idx <- runif(nrow(data)) <= .75
    data.train <- data[idx,]
    data.test <- data[-idx,]

    # train a random forest
    rf <- randomForest(Survived ~ PClass + Age + Sex,
    data=data.train, importance=TRUE, na.action=na.omit)

    # how important is each variable in the model
    imp <- importance(rf)
    o <- order(imp[,3], decreasing=T)
    imp[o,]
    # no yes MeanDecreaseAccuracy MeanDecreaseGini
    #Sex 51.49855 53.30255 55.13458 63.46861
    #PClass 25.48715 24.12522 28.43298 22.31789
    #Age 20.08571 14.07954 24.64607 19.57423

    # confusion matrix [[True Neg, False Pos], [False Neg, True Pos]]
    table(data.test$Survived, predict(rf, data.test), dnn=list("actual", "predicted"))
    # predicted
    #actual no yes
    # no 427 16
    # yes 117 195