embiem · October 22, 2019 14:58
diff --git a/ml-intro-ml-in-gamedev-workshop.ipynb b/ml-intro-ml-in-gamedev-workshop.ipynb
 {
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "ML Intro & ML in GameDev Workshop.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true,
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/embiem/0434fe421b06ee13f92db9ff7991ca99/ml-intro-ml-in-gamedev-workshop.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YdrOqn0OBSpx",
        "colab_type": "text"
      },
      "source": [
        "# ML Intro & ML in GameDev\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RZARGEjBBAVk",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Import necessary libraries\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "import io\n",
        "\n",
        "%matplotlib inline\n",
        "\n",
        "# Load the dataset\n",
        "from google.colab import files\n",
        "uploaded = files.upload()\n",
        "\n",
        "file_name = next(iter(uploaded.keys()))\n",
        "\n",
        "data = pd.read_csv(io.BytesIO(uploaded[file_name]))\n",
        "playerYs = data['playerY']\n",
        "features = data.drop('playerY', axis = 1)\n",
        "    \n",
        "# Success\n",
        "print(\"The dataset has {} data points with {} variables.\".format(*data.shape))\n",
        "data.head()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f3SbmFE9E6_z",
        "colab_type": "text"
      },
      "source": [
        "## Data Exploration"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "QwfgOccqFL60",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# TODO: Minimum playerYs of the data\n",
        "minimum_playerYs = \n",
        "\n",
        "# TODO: Maximum playerYs of the data\n",
        "maximum_playerYs = \n",
        "\n",
        "# TODO: Mean playerYs of the data\n",
        "mean_playerYs = \n",
        "\n",
        "# TODO: Median playerYs of the data\n",
        "median_playerYs = \n",
        "\n",
        "# TODO: Standard deviation of playerYs of the data\n",
        "std_playerYs = \n",
        "\n",
        "print(\"Min: {:,.4f}\".format(minimum_playerYs))\n",
        "print(\"Max: {:,.4f}\".format(maximum_playerYs))\n",
        "print(\"Mean: {:,.4f}\".format(mean_playerYs))\n",
        "print(\"Median: {:,.4f}\".format(median_playerYs))\n",
        "print(\"Standard deviation: {:,.4f}\".format(std_playerYs))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XT2J-8OQ7VG5",
        "colab_type": "text"
      },
      "source": [
        "### Measures of Center\n",
        "\n",
        "**Mean**: sum of the values divided by the number of values\n",
        "\n",
        "**Median**: sort the data and pick the value which lies in the middle, or for a even count the average of the two values in the middle. It has a robust tendency, which means it won’t be affected by outliers as much as the mean."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HKOxxPMz94E5",
        "colab_type": "text"
      },
      "source": [
        "### Measures of Spread\n",
        "\n",
        "**Standard Deviation**: measure of the amount of variation. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.\n",
        "\n",
        "Calculated by taking the square root of average squared deviation.\n",
        "\n",
        "*Standard deviation is an excellent way to identify outliers*. Data points that lie more than one standard deviation from the mean can be considered unusual.\n",
        "\n",
        "![alt text](https://upload.wikimedia.org/wikipedia/commons/8/8c/Standard_deviation_diagram.svg)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FXEbkvgY948F",
        "colab_type": "text"
      },
      "source": [
        "### Pairplot Graph\n",
        "\n",
        "Plot pairwise relationships in a dataset."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "qvB-iNebLNUO",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import seaborn as sns; sns.set()\n",
        "sns.pairplot(data);"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lYFy1yLzIEtD",
        "colab_type": "text"
      },
      "source": [
        "## Shuffle and Split Data"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Kl3BK35dINHm",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from sklearn.model_selection import train_test_split\n",
        "\n",
        "# TODO Shuffle and split the data into training and testing subsets\n",
        "X_train, X_test, y_train, y_test = "
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HvIgH_3_QeGk",
        "colab_type": "text"
      },
      "source": [
        "### Download the shuffled & split data\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8wzyp_MmQkVe",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "X_train.to_csv(\"train-features.csv\", index=False)\n",
        "y_train.to_csv(\"train-target.csv\", index=False, header=\"playerY\")\n",
        "X_test.to_csv(\"test-features.csv\", index=False)\n",
        "y_test.to_csv(\"test-target.csv\", index=False, header=\"playerY\")\n",
        "\n",
        "files.download(\"train-features.csv\")\n",
        "files.download(\"train-target.csv\")\n",
        "files.download(\"test-features.csv\")\n",
        "files.download(\"test-target.csv\")"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
 }
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "ML Intro & ML in GameDev Workshop.ipynb",
	"provenance": [],
	"collapsed_sections": [],
	"toc_visible": true,
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/embiem/0434fe421b06ee13f92db9ff7991ca99/ml-intro-ml-in-gamedev-workshop.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "YdrOqn0OBSpx",
	"colab_type": "text"
	},
	"source": [
	"# ML Intro & ML in GameDev\n",
	"\n"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "RZARGEjBBAVk",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# Import necessary libraries\n",
	"import numpy as np\n",
	"import pandas as pd\n",
	"import io\n",
	"\n",
	"%matplotlib inline\n",
	"\n",
	"# Load the dataset\n",
	"from google.colab import files\n",
	"uploaded = files.upload()\n",
	"\n",
	"file_name = next(iter(uploaded.keys()))\n",
	"\n",
	"data = pd.read_csv(io.BytesIO(uploaded[file_name]))\n",
	"playerYs = data['playerY']\n",
	"features = data.drop('playerY', axis = 1)\n",
	" \n",
	"# Success\n",
	"print(\"The dataset has {} data points with {} variables.\".format(*data.shape))\n",
	"data.head()"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "f3SbmFE9E6_z",
	"colab_type": "text"
	},
	"source": [
	"## Data Exploration"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "QwfgOccqFL60",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# TODO: Minimum playerYs of the data\n",
	"minimum_playerYs = \n",
	"\n",
	"# TODO: Maximum playerYs of the data\n",
	"maximum_playerYs = \n",
	"\n",
	"# TODO: Mean playerYs of the data\n",
	"mean_playerYs = \n",
	"\n",
	"# TODO: Median playerYs of the data\n",
	"median_playerYs = \n",
	"\n",
	"# TODO: Standard deviation of playerYs of the data\n",
	"std_playerYs = \n",
	"\n",
	"print(\"Min: {:,.4f}\".format(minimum_playerYs))\n",
	"print(\"Max: {:,.4f}\".format(maximum_playerYs))\n",
	"print(\"Mean: {:,.4f}\".format(mean_playerYs))\n",
	"print(\"Median: {:,.4f}\".format(median_playerYs))\n",
	"print(\"Standard deviation: {:,.4f}\".format(std_playerYs))"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "XT2J-8OQ7VG5",
	"colab_type": "text"
	},
	"source": [
	"### Measures of Center\n",
	"\n",
	"Mean: sum of the values divided by the number of values\n",
	"\n",
	"Median: sort the data and pick the value which lies in the middle, or for a even count the average of the two values in the middle. It has a robust tendency, which means it won’t be affected by outliers as much as the mean."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "HKOxxPMz94E5",
	"colab_type": "text"
	},
	"source": [
	"### Measures of Spread\n",
	"\n",
	"Standard Deviation: measure of the amount of variation. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.\n",
	"\n",
	"Calculated by taking the square root of average squared deviation.\n",
	"\n",
	"Standard deviation is an excellent way to identify outliers. Data points that lie more than one standard deviation from the mean can be considered unusual.\n",
	"\n",
	"![alt text](https://upload.wikimedia.org/wikipedia/commons/8/8c/Standard_deviation_diagram.svg)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "FXEbkvgY948F",
	"colab_type": "text"
	},
	"source": [
	"### Pairplot Graph\n",
	"\n",
	"Plot pairwise relationships in a dataset."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "qvB-iNebLNUO",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"import seaborn as sns; sns.set()\n",
	"sns.pairplot(data);"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "lYFy1yLzIEtD",
	"colab_type": "text"
	},
	"source": [
	"## Shuffle and Split Data"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "Kl3BK35dINHm",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"from sklearn.model_selection import train_test_split\n",
	"\n",
	"# TODO Shuffle and split the data into training and testing subsets\n",
	"X_train, X_test, y_train, y_test = "
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "HvIgH_3_QeGk",
	"colab_type": "text"
	},
	"source": [
	"### Download the shuffled & split data\n"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "8wzyp_MmQkVe",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"X_train.to_csv(\"train-features.csv\", index=False)\n",
	"y_train.to_csv(\"train-target.csv\", index=False, header=\"playerY\")\n",
	"X_test.to_csv(\"test-features.csv\", index=False)\n",
	"y_test.to_csv(\"test-target.csv\", index=False, header=\"playerY\")\n",
	"\n",
	"files.download(\"train-features.csv\")\n",
	"files.download(\"train-target.csv\")\n",
	"files.download(\"test-features.csv\")\n",
	"files.download(\"test-target.csv\")"
	],
	"execution_count": 0,
	"outputs": []
	}
	]
	}