Created
October 9, 2014 01:26
-
-
Save jtyberg/4e2b08434f34ee2bea60 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "", | |
"signature": "sha256:b4c0cf87b1fb34e620fb64741ce396cc0bc635b55249eea30c776eb06badccc4" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Invoking R from IPython Notebook\n", | |
"\n", | |
"This notebook shows how to use R from IPython. It demonstrates how to run R scripts, as well as how to invoke R code interactively using **`rpy2`** and [IPython magic integration](http://rpy.sourceforge.net/rpy2/doc-2.4/html/interactive.html#module-rpy2.ipython.rmagic).\n", | |
"\n", | |
"* Invoke R commands using IPython cell magics\n", | |
"* Invoke R directly using `rpy2` and R magics\n", | |
"* Install R packages\n", | |
"* Interactive analysis in R\n", | |
"* Pull R objects into Python\n", | |
"* Download as IPython notebook\n", | |
"\n", | |
"To use this notebook, you must have R installed (we used [these instructions](http://cran.r-project.org/bin/linux/ubuntu/README) for Ubuntu)." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Hello, World!\n", | |
"\n", | |
"Using the IPython `%%bash` cell magic, we can run any command that we might run in a bash shell. For example, we can create a simple, \"Hello, World!\" R script, and `cat` the result." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%bash\n", | |
"echo 'print ( \"Hello, World!\" )' > hello.R\n", | |
"cat hello.R" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can also execute our new script using the [`Rscript`](http://stat.ethz.ch/R-manual/R-devel/library/utils/html/Rscript.html) command line utility for R." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%bash\n", | |
"Rscript hello.R" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Invoke R Directly using `rpy2` and IPython R magics\n", | |
"\n", | |
"Invoking R from a bash shell is great, but many people use R in an interactive fashion (R is often used in a read-eval-print (REPL) loop. Enter [`rpy2`](http://rpy.sourceforge.net/rpy2/doc-2.4/html/index.html), a Python package that provides interfaces to facilitate invoking R code from Python. \n", | |
"\n", | |
"To install `rpy2`, we use `pip`, the Python package manager." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%bash\n", | |
"pip install rpy2" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%bash\n", | |
"pip freeze | grep rpy2" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"`rpy2` provides magics that allow us to **invoke R code directly** from within a cell, similar to the way we used `%%bash` magics to execute shell commands above. To use R magics from within a notebook, you need to load the `rpy2.ipython` extension." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%load_ext rpy2.ipython" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### `%R` Line Magic" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we can invoke R commands. Using the in-line magic (`%R`), we can even store the result." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"X_mean = %R X=c(1,3,5,7,9); mean(X)\n", | |
"X_mean" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can also pass objects back and forth between Python and R. Use the `-i` flag to specify input to R:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"X = [2,4,6,8,10]\n", | |
"X_median = %R -i X median(X)\n", | |
"X_median" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"and the `-o` flag to specify a Python variable in which to store output:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%R -o X_squared X_squared=X*X\n", | |
"X_squared" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### `%%R` Cell Magic" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The `%%R` cell magic allows us to run a block of R code, the output of which is published to the output of the cell:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%R\n", | |
"X=c(1,3,5,7,9)\n", | |
"Y=c(2,4,6,8,10)\n", | |
"X*Y" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Output of plots is also supported ([example source](http://www.statmethods.net/graphs/scatterplot.html)):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%R\n", | |
"attach(mtcars)\n", | |
"plot(wt, mpg, main=\"Scatterplot Example\", \n", | |
" \txlab=\"Car Weight \", ylab=\"Miles Per Gallon \", pch=19)\n", | |
"abline(lm(mpg~wt), col=\"red\") # regression line (y~x) \n", | |
"lines(lowess(wt,mpg), col=\"blue\") # lowess line (x,y)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Install R Packages\n", | |
"\n", | |
"R installs some packages by default, but oftentimes, we want to install others. For example, suppose we wanted to [find frequent sequences of items within a set](http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Sequence_Mining/SPADE). To do this, we can leverage the **`arules`** and **`arulesSequence`** packages." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%R\n", | |
"install.packages('arules', repos=\"http://watson.nci.nih.gov/cran_mirror/\")\n", | |
"install.packages('arulesSequences', repos=\"http://watson.nci.nih.gov/cran_mirror/\")" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%R\n", | |
"packageDescription(\"arulesSequences\")" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The `arulesSequences` package comes with the `zaki.txt` sample data (named after the SPADE creator), which is located in the package's `misc` directory." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%bash\n", | |
"cat /home/notebook/R/library/arulesSequences/misc/zaki.txt" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Interactive Analysis in R" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can load the packages and mine the sample data for frequent sequences of items." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%R\n", | |
"library(Matrix)\n", | |
"library(arules)\n", | |
"library(arulesSequences)\n", | |
"\n", | |
"# load the data set into a data frame\n", | |
"x <- read_baskets(con = system.file(\"misc\", \"zaki.txt\", package = \"arulesSequences\"), info = c(\"sequenceID\",\"eventID\",\"SIZE\"))\n", | |
"as(x, \"data.frame\")\n", | |
"\n", | |
"# run the CSPADE algorithm to mine frequent items\n", | |
"s1 <- cspade(x, parameter = list(support = 0.4), control = list(verbose = TRUE))" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Note that we can access R session state across multiple notebook cells. Here we print a summary of the results of the analysis performed in the previous cell. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%R\n", | |
"# output the results\n", | |
"summary(s1)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## `%RPull` the results\n", | |
"\n", | |
"IPython users may be more accustomed to manipulating data in Python. `rpy2` makes it easy to convert data between R objects and Python objects. For example, we may want to pull the results of our R analysis into a [Pandas](http://pandas.pydata.org/) DataFrame for further analysis. `rpy2` makes this trivial." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%%R\n", | |
"df <- as(s1, \"data.frame\")\n", | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Using `%RPull`, the R `data.frame` is automatically converted to a Pandas `DataFrame`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%Rpull df\n", | |
"type(df)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we can manipulate the pandas DataFrame." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%matplotlib inline\n", | |
"print df.support.describe()\n", | |
"df.support.hist()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[df.support > 0.5].sort('support', ascending=False)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Debugging\n", | |
"\n", | |
"Did something go wrong? If so, try one of these should R throw an error." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%R warnings()\n", | |
"%R traceback()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## References\n", | |
"\n", | |
"M. J. Zaki. (2001). SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, 42, 31--60.\n", | |
"[paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.113.6042&rep=rep1&type=pdf)" | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment