lluang · August 15, 2019 20:25
diff --git a/MOREplotplotnine.ipynb b/MOREplotplotnine.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MORE Plot for hospital reception simulation using plotnine\n",
    "Louis Luangkesorn [email protected]\n",
    "8/15/2019\n",
    "\n",
    "This introduces the Measure of Risk and Error (MORE) plot [Nelson08] of simulation output which simultaneously shows the analyst a sense of the risk, expressed as a histogram along with the mean, the 0.05 quantile, and the 0.95 quantile), and a sense of the error of the simulation output (expressed as 95% C.I. for the mean, the 0.05 quantile, and the 0.95 quantile).\n",
    "\n",
    "This example uses a hospital receptionist simulation orginally found in [Nelson13]. The implementation of the simulation uses the Simpy library in Python [simmer19]\n",
    "\n",
    "Nelson08: Nelson, B., (2008), THE MORE PLOT: DISPLAYING MEASURES OF RISK & ERROR FROM SIMULATION OUTPUT, 2008 Winter Simulation Conference, https://www.informs-sim.org/wsc08papers/048.pdf\n",
    "\n",
    "Nelson13: Nelson, B., (2013), Foundations and Methods of Stochastic Simulation, Springer, http://users.iems.northwestern.edu/~nelsonb/IEMS435/\n",
    "\n",
    "Simpy19: Stefan Scherfke and Ontje Lünsdorf (2019). Simpy: Discrete Event Simulation for Python, https://simpy.readthedocs.io/en/latest/contents.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Hospital reception discrete event simulation with \n",
    "# MORE Plot\n",
    "import matplotlib.pyplot as plt\n",
    "import scipy as sp\n",
    "from scipy.stats import expon, erlang\n",
    "import pandas as pd\n",
    "import simpy\n",
    "import plotnine as p9\n",
    "\n",
    "def source(env, interval, receptionist):\n",
    "    i = 0\n",
    "    while True:\n",
    "        i += 1\n",
    "        c = patient(env, 'patient%02d' % i, receptionist, time_service=12.0)\n",
    "        env.process(c)\n",
    "        t = expon.rvs(0, 1.0/G.ARRint, size=1)\n",
    "        yield env.timeout(t)\n",
    "\n",
    "\n",
    "def patient(env, name, receptionist, time_service):\n",
    "    arrive = env.now\n",
    "    with receptionist.request() as req:\n",
    "        yield req\n",
    "        timenow = env.now\n",
    "        wait = timenow - arrive\n",
    "        env.arrival.append(timenow)\n",
    "        env.waittime.append(wait)\n",
    "        tis = erlang.rvs(G.phases,\n",
    "                         scale=float(G.timeReceptionist)/G.phases, size=1)\n",
    "        yield env.timeout(tis)\n",
    "    return\n",
    "\n",
    "\n",
    "#  Experiment data -------------------------\n",
    "class G:\n",
    "    RANDOM_SEED = 42\n",
    "    INTERVAL_patientS = 10.0  # Generate new patients roughly every x seconds\n",
    "    maxTime = 8 * 60         # minutes\n",
    "    timeReceptionist = 0.8  # mean, minutes\n",
    "    phases = 3\n",
    "    ARRint = 1.0            # mean, minutes\n",
    "    theseed = 99999\n",
    "\n",
    "\n",
    "class HospitalSim(object):\n",
    "    def run(self, aseed):\n",
    "        print('Hospital simulation')\n",
    "        sp.random.seed(aseed)\n",
    "        env = simpy.Environment()\n",
    "        env.arrival = []\n",
    "        env.waittime = []\n",
    "        receptionist = simpy.Resource(env, capacity=1)\n",
    "        env.process(source(env, G.INTERVAL_patientS, receptionist))\n",
    "        env.run(until=G.maxTime)\n",
    "        return env\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reps = 10\n",
    "hospital = HospitalSim()\n",
    "arrivallist = []\n",
    "waittimelist = []\n",
    "for i in range(reps):\n",
    "    results = hospital.run(G.RANDOM_SEED+i)\n",
    "    arrivallist.append([float(x) for x in results.arrival])\n",
    "    waittimelist.append([float(x) for x in results.waittime])\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Combine all observations into a single list for the overall histogram\n",
    "# Identify performance measure to make this generic\n",
    "observationlist = waittimelist\n",
    "observationname = 'Flow time'\n",
    "allwaits = [item for sublist in observationlist for item in sublist]\n",
    "allwaitsdf = pd.DataFrame({'observation':allwaits})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "flowplot = (p9.ggplot(allwaitsdf) + \n",
    "    p9.geom_histogram(p9.aes(x='observation', y='stat(density)')))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate means and quantiles for each replication\n",
    "meanwaits = [sp.mean(observation) for observation in observationlist]\n",
    "q05waits = [sp.quantile(observation, q=0.05) for observation in observationlist]\n",
    "q95waits = [sp.quantile(observation, q=0.95) for observation in observationlist]\n",
    "meanstats = pd.DataFrame({'mean':[sp.mean(meanwaits)],\n",
    "                           'lowerci': [sp.stats.t.ppf(0.05,df=reps) * sp.std(meanwaits)/len(meanwaits) + \n",
    "                                      sp.mean(meanwaits)],\n",
    "                           'upperci': [sp.stats.t.ppf(0.95,df=reps) * sp.std(meanwaits)/len(meanwaits) + \n",
    "                                      sp.mean(meanwaits)]})\n",
    "q05stats = pd.DataFrame({'mean':[sp.mean(q05waits)],\n",
    "                           'lowerci': [sp.stats.t.ppf(0.05,df=reps) * sp.std(q05waits)/len(q05waits) + \n",
    "                                      sp.mean(q05waits)],\n",
    "                           'upperci': [sp.stats.t.ppf(0.95,df=reps) * sp.std(q05waits)/len(q05waits) + \n",
    "                                      sp.mean(q05waits)]})\n",
    "q95stats = pd.DataFrame({'mean':[sp.mean(q95waits)],\n",
    "                           'lowerci': [sp.stats.t.ppf(0.05,df=reps) * sp.std(q95waits)/len(q95waits) + \n",
    "                                      sp.mean(q95waits)],\n",
    "                           'upperci': [sp.stats.t.ppf(0.95,df=reps) * sp.std(q95waits)/len(q95waits) + \n",
    "                                      sp.mean(q95waits)]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(meanstats)\n",
    "print(q05stats)\n",
    "print(q95stats)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Measures of Risk and Error\n",
    "\n",
    "## Risk vs. Error\n",
    "    - Measures of risk\n",
    "        -  Directly support decision making.\n",
    "        -  What is the range of outcomes that can occur and their probabilities if the system is designed in a specific way.\n",
    "        -  If $F_Y(5)$ (probability of completing a project in 5 days) is too small, we may decide not to undertake the project as designed.\n",
    "    -  Measures of error\n",
    "    -  Directly support experiment design\n",
    "    -  Did we expend enough simulation effort (e.g. replications) to be confident in our estimates of system performance\n",
    "\n",
    "## Measure of Risk and Error (MORE) Plot\n",
    "\n",
    "- Measures of error. The C.I. on the average or the quantiles are measure of error.\n",
    "    - As a simulation analyst we can reduce error (how?)\n",
    "    - How to choose which performance measure's C.I. we should use in specifying error.\n",
    "    -  What does the C.I. on these statistics (average, quantile) tell us about our system and experimental design?\n",
    "- Measures of risk\n",
    "    -  How can we as engineers reduce risk?\n",
    "    -  How do we measure risk? What do these types of statistics actually tell us about our system and experimental design?\n",
    "    \n",
    "## Creating a MORE plot\n",
    "-  Start with a histogram.\n",
    "-  Add the sample average and 0.05 and 0.95 quantiles.\n",
    "    -  This range is considered likely, beyond that is considered unlikely.\n",
    "-  Measure of risk\n",
    "    -  Include 95% C.I. for the average and both quantiles.\n",
    "    -  Measure of error at the center and the extremes of the range of outcomes.\n",
    "-  Goal: to represent both risk and error in an intuitively understandable way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "(flowplot + \n",
    "    p9.geom_errorbarh(data = meanstats, mapping=p9.aes(xmax = 'upperci', xmin = 'lowerci', \n",
    "                                         y = -0.02), height = 0.01) +\n",
    "    p9.geom_errorbarh(data = q05stats, mapping=p9.aes(xmax = 'upperci', xmin = 'lowerci', \n",
    "                                        y = -0.02), height = 0.01) +\n",
    "    p9.geom_errorbarh(data = q95stats, mapping=p9.aes(xmax = 'upperci', xmin = 'lowerci', \n",
    "                                        y = -0.02), height = 0.01) +\n",
    "    p9.annotate(\"text\", label = \"C.I.\\n mean\", x = (meanstats['mean']), y = -0.05, size = 9) +\n",
    "    p9.annotate(\"text\", label = \"C.I.\\n Q0.05\", x = q05stats['mean'], y = -0.05, size = 9) +\n",
    "    p9.annotate(\"text\", label = \"C.I.\\n Q0.95\", x = q95stats['mean'], y = -0.05, size = 9) +\n",
    "    p9.ggtitle(\"MORE plot for \" + observationname) +\n",
    "    p9.xlab(observationname)\n",
    "\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this MORE plot, the error bars below the x-axis represent the 95% C.I. of the mean and 0.05 and 0.95 quantiles. The mean and quantiles provides a measure of the risk, what is the range of values that will generally be experienced by entities in this system (visitors to a hospital going to the information desk). The error bars provide measures of the error, which are the confidence intervals of the estimate of the mean, the 0.05 quantile, and the 0.95 quantile. Note that the measure of error is not the same for each of the three. Depending on what performance measure is most useful for the system, the analyst would then decide how many replications would be needed to reduce that particular performance measure's C.I. to an acceptable level."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# MORE Plot for hospital reception simulation using plotnine\n",
	"Louis Luangkesorn [email protected]\n",
	"8/15/2019\n",
	"\n",
	"This introduces the Measure of Risk and Error (MORE) plot [Nelson08] of simulation output which simultaneously shows the analyst a sense of the risk, expressed as a histogram along with the mean, the 0.05 quantile, and the 0.95 quantile), and a sense of the error of the simulation output (expressed as 95% C.I. for the mean, the 0.05 quantile, and the 0.95 quantile).\n",
	"\n",
	"This example uses a hospital receptionist simulation orginally found in [Nelson13]. The implementation of the simulation uses the Simpy library in Python [simmer19]\n",
	"\n",
	"Nelson08: Nelson, B., (2008), THE MORE PLOT: DISPLAYING MEASURES OF RISK & ERROR FROM SIMULATION OUTPUT, 2008 Winter Simulation Conference, https://www.informs-sim.org/wsc08papers/048.pdf\n",
	"\n",
	"Nelson13: Nelson, B., (2013), Foundations and Methods of Stochastic Simulation, Springer, http://users.iems.northwestern.edu/~nelsonb/IEMS435/\n",
	"\n",
	"Simpy19: Stefan Scherfke and Ontje Lünsdorf (2019). Simpy: Discrete Event Simulation for Python, https://simpy.readthedocs.io/en/latest/contents.html"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Hospital reception discrete event simulation with \n",
	"# MORE Plot\n",
	"import matplotlib.pyplot as plt\n",
	"import scipy as sp\n",
	"from scipy.stats import expon, erlang\n",
	"import pandas as pd\n",
	"import simpy\n",
	"import plotnine as p9\n",
	"\n",
	"def source(env, interval, receptionist):\n",
	" i = 0\n",
	" while True:\n",
	" i += 1\n",
	" c = patient(env, 'patient%02d' % i, receptionist, time_service=12.0)\n",
	" env.process(c)\n",
	" t = expon.rvs(0, 1.0/G.ARRint, size=1)\n",
	" yield env.timeout(t)\n",
	"\n",
	"\n",
	"def patient(env, name, receptionist, time_service):\n",
	" arrive = env.now\n",
	" with receptionist.request() as req:\n",
	" yield req\n",
	" timenow = env.now\n",
	" wait = timenow - arrive\n",
	" env.arrival.append(timenow)\n",
	" env.waittime.append(wait)\n",
	" tis = erlang.rvs(G.phases,\n",
	" scale=float(G.timeReceptionist)/G.phases, size=1)\n",
	" yield env.timeout(tis)\n",
	" return\n",
	"\n",
	"\n",
	"# Experiment data -------------------------\n",
	"class G:\n",
	" RANDOM_SEED = 42\n",
	" INTERVAL_patientS = 10.0 # Generate new patients roughly every x seconds\n",
	" maxTime = 8 * 60 # minutes\n",
	" timeReceptionist = 0.8 # mean, minutes\n",
	" phases = 3\n",
	" ARRint = 1.0 # mean, minutes\n",
	" theseed = 99999\n",
	"\n",
	"\n",
	"class HospitalSim(object):\n",
	" def run(self, aseed):\n",
	" print('Hospital simulation')\n",
	" sp.random.seed(aseed)\n",
	" env = simpy.Environment()\n",
	" env.arrival = []\n",
	" env.waittime = []\n",
	" receptionist = simpy.Resource(env, capacity=1)\n",
	" env.process(source(env, G.INTERVAL_patientS, receptionist))\n",
	" env.run(until=G.maxTime)\n",
	" return env\n",
	"\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"reps = 10\n",
	"hospital = HospitalSim()\n",
	"arrivallist = []\n",
	"waittimelist = []\n",
	"for i in range(reps):\n",
	" results = hospital.run(G.RANDOM_SEED+i)\n",
	" arrivallist.append([float(x) for x in results.arrival])\n",
	" waittimelist.append([float(x) for x in results.waittime])\n",
	"\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Combine all observations into a single list for the overall histogram\n",
	"# Identify performance measure to make this generic\n",
	"observationlist = waittimelist\n",
	"observationname = 'Flow time'\n",
	"allwaits = [item for sublist in observationlist for item in sublist]\n",
	"allwaitsdf = pd.DataFrame({'observation':allwaits})"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"flowplot = (p9.ggplot(allwaitsdf) + \n",
	" p9.geom_histogram(p9.aes(x='observation', y='stat(density)')))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Generate means and quantiles for each replication\n",
	"meanwaits = [sp.mean(observation) for observation in observationlist]\n",
	"q05waits = [sp.quantile(observation, q=0.05) for observation in observationlist]\n",
	"q95waits = [sp.quantile(observation, q=0.95) for observation in observationlist]\n",
	"meanstats = pd.DataFrame({'mean':[sp.mean(meanwaits)],\n",
	" 'lowerci': [sp.stats.t.ppf(0.05,df=reps) * sp.std(meanwaits)/len(meanwaits) + \n",
	" sp.mean(meanwaits)],\n",
	" 'upperci': [sp.stats.t.ppf(0.95,df=reps) * sp.std(meanwaits)/len(meanwaits) + \n",
	" sp.mean(meanwaits)]})\n",
	"q05stats = pd.DataFrame({'mean':[sp.mean(q05waits)],\n",
	" 'lowerci': [sp.stats.t.ppf(0.05,df=reps) * sp.std(q05waits)/len(q05waits) + \n",
	" sp.mean(q05waits)],\n",
	" 'upperci': [sp.stats.t.ppf(0.95,df=reps) * sp.std(q05waits)/len(q05waits) + \n",
	" sp.mean(q05waits)]})\n",
	"q95stats = pd.DataFrame({'mean':[sp.mean(q95waits)],\n",
	" 'lowerci': [sp.stats.t.ppf(0.05,df=reps) * sp.std(q95waits)/len(q95waits) + \n",
	" sp.mean(q95waits)],\n",
	" 'upperci': [sp.stats.t.ppf(0.95,df=reps) * sp.std(q95waits)/len(q95waits) + \n",
	" sp.mean(q95waits)]})"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"print(meanstats)\n",
	"print(q05stats)\n",
	"print(q95stats)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Measures of Risk and Error\n",
	"\n",
	"## Risk vs. Error\n",
	" - Measures of risk\n",
	" - Directly support decision making.\n",
	" - What is the range of outcomes that can occur and their probabilities if the system is designed in a specific way.\n",
	" - If $F_Y(5)$ (probability of completing a project in 5 days) is too small, we may decide not to undertake the project as designed.\n",
	" - Measures of error\n",
	" - Directly support experiment design\n",
	" - Did we expend enough simulation effort (e.g. replications) to be confident in our estimates of system performance\n",
	"\n",
	"## Measure of Risk and Error (MORE) Plot\n",
	"\n",
	"- Measures of error. The C.I. on the average or the quantiles are measure of error.\n",
	" - As a simulation analyst we can reduce error (how?)\n",
	" - How to choose which performance measure's C.I. we should use in specifying error.\n",
	" - What does the C.I. on these statistics (average, quantile) tell us about our system and experimental design?\n",
	"- Measures of risk\n",
	" - How can we as engineers reduce risk?\n",
	" - How do we measure risk? What do these types of statistics actually tell us about our system and experimental design?\n",
	" \n",
	"## Creating a MORE plot\n",
	"- Start with a histogram.\n",
	"- Add the sample average and 0.05 and 0.95 quantiles.\n",
	" - This range is considered likely, beyond that is considered unlikely.\n",
	"- Measure of risk\n",
	" - Include 95% C.I. for the average and both quantiles.\n",
	" - Measure of error at the center and the extremes of the range of outcomes.\n",
	"- Goal: to represent both risk and error in an intuitively understandable way."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"(flowplot + \n",
	" p9.geom_errorbarh(data = meanstats, mapping=p9.aes(xmax = 'upperci', xmin = 'lowerci', \n",
	" y = -0.02), height = 0.01) +\n",
	" p9.geom_errorbarh(data = q05stats, mapping=p9.aes(xmax = 'upperci', xmin = 'lowerci', \n",
	" y = -0.02), height = 0.01) +\n",
	" p9.geom_errorbarh(data = q95stats, mapping=p9.aes(xmax = 'upperci', xmin = 'lowerci', \n",
	" y = -0.02), height = 0.01) +\n",
	" p9.annotate(\"text\", label = \"C.I.\\n mean\", x = (meanstats['mean']), y = -0.05, size = 9) +\n",
	" p9.annotate(\"text\", label = \"C.I.\\n Q0.05\", x = q05stats['mean'], y = -0.05, size = 9) +\n",
	" p9.annotate(\"text\", label = \"C.I.\\n Q0.95\", x = q95stats['mean'], y = -0.05, size = 9) +\n",
	" p9.ggtitle(\"MORE plot for \" + observationname) +\n",
	" p9.xlab(observationname)\n",
	"\n",
	")"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In this MORE plot, the error bars below the x-axis represent the 95% C.I. of the mean and 0.05 and 0.95 quantiles. The mean and quantiles provides a measure of the risk, what is the range of values that will generally be experienced by entities in this system (visitors to a hospital going to the information desk). The error bars provide measures of the error, which are the confidence intervals of the estimate of the mean, the 0.05 quantile, and the 0.95 quantile. Note that the measure of error is not the same for each of the three. Depending on what performance measure is most useful for the system, the analyst would then decide how many replications would be needed to reduce that particular performance measure's C.I. to an acceptable level."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.8"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}