jtyberg · October 9, 2014 01:26
diff --git a/Invoking_R.ipynb b/Invoking_R.ipynb
 {
 "metadata": {
  "name": "",
  "signature": "sha256:b4c0cf87b1fb34e620fb64741ce396cc0bc635b55249eea30c776eb06badccc4"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Invoking R from IPython Notebook\n",
      "\n",
      "This notebook shows how to use R from IPython.  It demonstrates how to run R scripts, as well as how to invoke R code interactively using **`rpy2`** and [IPython magic integration](http://rpy.sourceforge.net/rpy2/doc-2.4/html/interactive.html#module-rpy2.ipython.rmagic).\n",
      "\n",
      "* Invoke R commands using IPython cell magics\n",
      "* Invoke R directly using `rpy2` and R magics\n",
      "* Install R packages\n",
      "* Interactive analysis in R\n",
      "* Pull R objects into Python\n",
      "* Download as IPython notebook\n",
      "\n",
      "To use this notebook, you must have R installed (we used [these instructions](http://cran.r-project.org/bin/linux/ubuntu/README) for Ubuntu)."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Hello, World!\n",
      "\n",
      "Using the IPython `%%bash` cell magic, we can run any command that we might run in a bash shell.  For example, we can create a simple, \"Hello, World!\" R script, and `cat` the result."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "echo 'print ( \"Hello, World!\" )' > hello.R\n",
      "cat hello.R"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can also execute our new script using the [`Rscript`](http://stat.ethz.ch/R-manual/R-devel/library/utils/html/Rscript.html) command line utility for R."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "Rscript hello.R"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Invoke R Directly using `rpy2` and IPython R magics\n",
      "\n",
      "Invoking R from a bash shell is great, but many people use R in an interactive fashion (R is often used in a read-eval-print (REPL) loop.  Enter [`rpy2`](http://rpy.sourceforge.net/rpy2/doc-2.4/html/index.html), a Python package that provides interfaces to facilitate invoking R code from Python.  \n",
      "\n",
      "To install `rpy2`, we use `pip`, the Python package manager."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "pip install rpy2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "pip freeze | grep rpy2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "`rpy2` provides magics that allow us to **invoke R code directly** from within a cell, similar to the way we used `%%bash` magics to execute shell commands above.  To use R magics from within a notebook, you need to load the `rpy2.ipython` extension."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%load_ext rpy2.ipython"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### `%R` Line Magic"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we can invoke R commands.  Using the in-line magic (`%R`), we can even store the result."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "X_mean = %R X=c(1,3,5,7,9); mean(X)\n",
      "X_mean"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can also pass objects back and forth between Python and R.  Use the `-i` flag to specify input to R:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "X = [2,4,6,8,10]\n",
      "X_median = %R -i X median(X)\n",
      "X_median"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "and the `-o` flag to specify a Python variable in which to store output:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%R -o X_squared X_squared=X*X\n",
      "X_squared"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### `%%R` Cell Magic"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The `%%R` cell magic allows us to run a block of R code, the output of which is published to the output of the cell:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%R\n",
      "X=c(1,3,5,7,9)\n",
      "Y=c(2,4,6,8,10)\n",
      "X*Y"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Output of plots is also supported ([example source](http://www.statmethods.net/graphs/scatterplot.html)):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%R\n",
      "attach(mtcars)\n",
      "plot(wt, mpg, main=\"Scatterplot Example\", \n",
      "  \txlab=\"Car Weight \", ylab=\"Miles Per Gallon \", pch=19)\n",
      "abline(lm(mpg~wt), col=\"red\") # regression line (y~x) \n",
      "lines(lowess(wt,mpg), col=\"blue\") # lowess line (x,y)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Install R Packages\n",
      "\n",
      "R installs some packages by default, but oftentimes, we want to install others.  For example, suppose we wanted to [find frequent sequences of items within a set](http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Sequence_Mining/SPADE).  To do this, we can leverage the **`arules`** and **`arulesSequence`** packages."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%R\n",
      "install.packages('arules', repos=\"http://watson.nci.nih.gov/cran_mirror/\")\n",
      "install.packages('arulesSequences', repos=\"http://watson.nci.nih.gov/cran_mirror/\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%R\n",
      "packageDescription(\"arulesSequences\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The `arulesSequences` package comes with the `zaki.txt` sample data (named after the SPADE creator), which is located in the package's `misc` directory."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%bash\n",
      "cat /home/notebook/R/library/arulesSequences/misc/zaki.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Interactive Analysis in R"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can load the packages and mine the sample data for frequent sequences of items."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%R\n",
      "library(Matrix)\n",
      "library(arules)\n",
      "library(arulesSequences)\n",
      "\n",
      "# load the data set into a data frame\n",
      "x <- read_baskets(con = system.file(\"misc\", \"zaki.txt\", package = \"arulesSequences\"), info = c(\"sequenceID\",\"eventID\",\"SIZE\"))\n",
      "as(x, \"data.frame\")\n",
      "\n",
      "# run the CSPADE algorithm to mine frequent items\n",
      "s1 <- cspade(x, parameter = list(support = 0.4), control = list(verbose = TRUE))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Note that we can access R session state across multiple notebook cells.  Here we print a summary of the results of the analysis performed in the previous cell. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%R\n",
      "# output the results\n",
      "summary(s1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## `%RPull` the results\n",
      "\n",
      "IPython users may be more accustomed to manipulating data in Python.  `rpy2` makes it easy to convert data between R objects and Python objects.  For example, we may want to pull the results of our R analysis into a [Pandas](http://pandas.pydata.org/) DataFrame for further analysis.  `rpy2`  makes this trivial."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%R\n",
      "df <- as(s1, \"data.frame\")\n",
      "df"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Using `%RPull`, the R `data.frame` is automatically converted to a Pandas `DataFrame`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%Rpull df\n",
      "type(df)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we can manipulate the pandas DataFrame."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%matplotlib inline\n",
      "print df.support.describe()\n",
      "df.support.hist()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df[df.support > 0.5].sort('support', ascending=False)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Debugging\n",
      "\n",
      "Did something go wrong?  If so, try one of these should R throw an error."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%R warnings()\n",
      "%R traceback()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## References\n",
      "\n",
      "M. J. Zaki. (2001). SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, 42, 31--60.\n",
      "[paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.113.6042&rep=rep1&type=pdf)"
     ]
    }
   ],
   "metadata": {}
  }
 ]
 }
	{
	"metadata": {
	"name": "",
	"signature": "sha256:b4c0cf87b1fb34e620fb64741ce396cc0bc635b55249eea30c776eb06badccc4"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Invoking R from IPython Notebook\n",
	"\n",
	"This notebook shows how to use R from IPython. It demonstrates how to run R scripts, as well as how to invoke R code interactively using `rpy2` and [IPython magic integration](http://rpy.sourceforge.net/rpy2/doc-2.4/html/interactive.html#module-rpy2.ipython.rmagic).\n",
	"\n",
	"* Invoke R commands using IPython cell magics\n",
	"* Invoke R directly using `rpy2` and R magics\n",
	"* Install R packages\n",
	"* Interactive analysis in R\n",
	"* Pull R objects into Python\n",
	"* Download as IPython notebook\n",
	"\n",
	"To use this notebook, you must have R installed (we used [these instructions](http://cran.r-project.org/bin/linux/ubuntu/README) for Ubuntu)."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Hello, World!\n",
	"\n",
	"Using the IPython `%%bash` cell magic, we can run any command that we might run in a bash shell. For example, we can create a simple, \"Hello, World!\" R script, and `cat` the result."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%bash\n",
	"echo 'print ( \"Hello, World!\" )' > hello.R\n",
	"cat hello.R"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can also execute our new script using the [`Rscript`](http://stat.ethz.ch/R-manual/R-devel/library/utils/html/Rscript.html) command line utility for R."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%bash\n",
	"Rscript hello.R"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Invoke R Directly using `rpy2` and IPython R magics\n",
	"\n",
	"Invoking R from a bash shell is great, but many people use R in an interactive fashion (R is often used in a read-eval-print (REPL) loop. Enter [`rpy2`](http://rpy.sourceforge.net/rpy2/doc-2.4/html/index.html), a Python package that provides interfaces to facilitate invoking R code from Python. \n",
	"\n",
	"To install `rpy2`, we use `pip`, the Python package manager."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%bash\n",
	"pip install rpy2"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%bash\n",
	"pip freeze \| grep rpy2"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"`rpy2` provides magics that allow us to invoke R code directly from within a cell, similar to the way we used `%%bash` magics to execute shell commands above. To use R magics from within a notebook, you need to load the `rpy2.ipython` extension."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%load_ext rpy2.ipython"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### `%R` Line Magic"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we can invoke R commands. Using the in-line magic (`%R`), we can even store the result."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"X_mean = %R X=c(1,3,5,7,9); mean(X)\n",
	"X_mean"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can also pass objects back and forth between Python and R. Use the `-i` flag to specify input to R:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"X = [2,4,6,8,10]\n",
	"X_median = %R -i X median(X)\n",
	"X_median"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"and the `-o` flag to specify a Python variable in which to store output:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%R -o X_squared X_squared=X*X\n",
	"X_squared"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### `%%R` Cell Magic"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The `%%R` cell magic allows us to run a block of R code, the output of which is published to the output of the cell:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%R\n",
	"X=c(1,3,5,7,9)\n",
	"Y=c(2,4,6,8,10)\n",
	"X*Y"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Output of plots is also supported ([example source](http://www.statmethods.net/graphs/scatterplot.html)):"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%R\n",
	"attach(mtcars)\n",
	"plot(wt, mpg, main=\"Scatterplot Example\", \n",
	" \txlab=\"Car Weight \", ylab=\"Miles Per Gallon \", pch=19)\n",
	"abline(lm(mpg~wt), col=\"red\") # regression line (y~x) \n",
	"lines(lowess(wt,mpg), col=\"blue\") # lowess line (x,y)"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Install R Packages\n",
	"\n",
	"R installs some packages by default, but oftentimes, we want to install others. For example, suppose we wanted to [find frequent sequences of items within a set](http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Sequence_Mining/SPADE). To do this, we can leverage the `arules` and `arulesSequence` packages."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%R\n",
	"install.packages('arules', repos=\"http://watson.nci.nih.gov/cran_mirror/\")\n",
	"install.packages('arulesSequences', repos=\"http://watson.nci.nih.gov/cran_mirror/\")"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%R\n",
	"packageDescription(\"arulesSequences\")"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The `arulesSequences` package comes with the `zaki.txt` sample data (named after the SPADE creator), which is located in the package's `misc` directory."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%bash\n",
	"cat /home/notebook/R/library/arulesSequences/misc/zaki.txt"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Interactive Analysis in R"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can load the packages and mine the sample data for frequent sequences of items."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%R\n",
	"library(Matrix)\n",
	"library(arules)\n",
	"library(arulesSequences)\n",
	"\n",
	"# load the data set into a data frame\n",
	"x <- read_baskets(con = system.file(\"misc\", \"zaki.txt\", package = \"arulesSequences\"), info = c(\"sequenceID\",\"eventID\",\"SIZE\"))\n",
	"as(x, \"data.frame\")\n",
	"\n",
	"# run the CSPADE algorithm to mine frequent items\n",
	"s1 <- cspade(x, parameter = list(support = 0.4), control = list(verbose = TRUE))"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Note that we can access R session state across multiple notebook cells. Here we print a summary of the results of the analysis performed in the previous cell. "
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%R\n",
	"# output the results\n",
	"summary(s1)"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## `%RPull` the results\n",
	"\n",
	"IPython users may be more accustomed to manipulating data in Python. `rpy2` makes it easy to convert data between R objects and Python objects. For example, we may want to pull the results of our R analysis into a [Pandas](http://pandas.pydata.org/) DataFrame for further analysis. `rpy2` makes this trivial."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%R\n",
	"df <- as(s1, \"data.frame\")\n",
	"df"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Using `%RPull`, the R `data.frame` is automatically converted to a Pandas `DataFrame`."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%Rpull df\n",
	"type(df)"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we can manipulate the pandas DataFrame."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%matplotlib inline\n",
	"print df.support.describe()\n",
	"df.support.hist()"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"df[df.support > 0.5].sort('support', ascending=False)"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Debugging\n",
	"\n",
	"Did something go wrong? If so, try one of these should R throw an error."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%R warnings()\n",
	"%R traceback()"
	],
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## References\n",
	"\n",
	"M. J. Zaki. (2001). SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, 42, 31--60.\n",
	"[paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.113.6042&rep=rep1&type=pdf)"
	]
	}
	],
	"metadata": {}
	}
	]
	}