Skip to content

Instantly share code, notes, and snippets.

@ashwath007
Created August 5, 2021 19:42
Show Gist options
  • Save ashwath007/cea49594c21f0f8b4b19c2eb60f8ad9d to your computer and use it in GitHub Desktop.
Save ashwath007/cea49594c21f0f8b4b19c2eb60f8ad9d to your computer and use it in GitHub Desktop.
Overfittting_and_underfitting.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Overfittting_and_underfitting.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyNH/6aBM+n7SsyprDqH7GwT",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/ashwath007/cea49594c21f0f8b4b19c2eb60f8ad9d/overfittting_and_underfitting.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "4Iooh-Z4omjN"
},
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd"
],
"execution_count": 1,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "NJshztlbouHk"
},
"source": [
"**Importing our Dataset**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "uObnAHGSoytM"
},
"source": [
"dataset = pd.read_csv('Data.csv')\n",
"X = dataset.iloc[:, 1:-1].values\n",
"y = dataset.iloc[:, -1].values"
],
"execution_count": 17,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "QtGhOtJ0pB2k"
},
"source": [
"**View and Pollting the dataset**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 203
},
"id": "IdOKRijMpTJs",
"outputId": "c0273ae2-3c62-4334-d56b-37c72b853849"
},
"source": [
"dataset.head()"
],
"execution_count": 18,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Position</th>\n",
" <th>Level</th>\n",
" <th>Salary</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Data Analyst</td>\n",
" <td>1</td>\n",
" <td>45000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>SDE</td>\n",
" <td>2</td>\n",
" <td>50000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>SDE 1</td>\n",
" <td>3</td>\n",
" <td>60000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Team Lead</td>\n",
" <td>4</td>\n",
" <td>80000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>HR</td>\n",
" <td>5</td>\n",
" <td>110000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Position Level Salary\n",
"0 Data Analyst 1 45000\n",
"1 SDE 2 50000\n",
"2 SDE 1 3 60000\n",
"3 Team Lead 4 80000\n",
"4 HR 5 110000"
]
},
"metadata": {
"tags": []
},
"execution_count": 18
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 295
},
"id": "HeuNo19gpE2j",
"outputId": "507d1f30-fe6a-4722-8a3a-5dfa52877939"
},
"source": [
"plt.scatter(X, y, color = 'red')\n",
"plt.title('Thedot - in')\n",
"plt.xlabel('Position Level')\n",
"plt.ylabel('Salary')\n",
"plt.show() "
],
"execution_count": 57,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "93HzgC7iriWO"
},
"source": [
"**Training the model**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3l2m5-NjqxsO",
"outputId": "d2e45a0f-3d83-4e66-a18b-ad5e3b5990c8"
},
"source": [
"from sklearn.linear_model import LinearRegression\n",
"lin_reg = LinearRegression()\n",
"lin_reg.fit(X, y)\n",
"from sklearn.preprocessing import PolynomialFeatures\n",
"poly_reg = PolynomialFeatures(degree = 20)\n",
"X_poly = poly_reg.fit_transform(X)\n",
"lin_reg_2 = LinearRegression()\n",
"lin_reg_2.fit(X_poly, y)\n"
],
"execution_count": 58,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)"
]
},
"metadata": {
"tags": []
},
"execution_count": 58
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D-9X5aoIrqCN"
},
"source": [
"**Trained Polynomial Regression with higher degrees**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 295
},
"id": "gFpKVAXjq8rN",
"outputId": "5e9284f9-9c25-4a59-e1e6-c2fdcd0c8d35"
},
"source": [
"plt.scatter(X, y, color = 'red')\n",
"plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')\n",
"plt.title('Thedot - in')\n",
"plt.xlabel('Position level')\n",
"plt.ylabel('Salary')\n",
"plt.show()"
],
"execution_count": 59,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "vZoZlEGSsmCM",
"outputId": "1ed0e31b-c289-415e-a3f0-b22fa2cd6925"
},
"source": [
"lin_reg_2.predict(poly_reg.fit_transform([[8.5]]))"
],
"execution_count": 60,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([349013.934707])"
]
},
"metadata": {
"tags": []
},
"execution_count": 60
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kGb3NwLjsE4d"
},
"source": [
"**Trained Polynomial Regression with lower degrees**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Dx5eZ9QesI7G",
"outputId": "70f296d9-5258-4846-a8c1-45196f2d99e4"
},
"source": [
"from sklearn.linear_model import LinearRegression\n",
"lin_reg = LinearRegression()\n",
"lin_reg.fit(X, y)\n",
"from sklearn.preprocessing import PolynomialFeatures\n",
"poly_reg = PolynomialFeatures(degree = 2)\n",
"X_poly = poly_reg.fit_transform(X)\n",
"lin_reg_2 = LinearRegression()\n",
"lin_reg_2.fit(X_poly, y)"
],
"execution_count": 61,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)"
]
},
"metadata": {
"tags": []
},
"execution_count": 61
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 295
},
"id": "erOIBuw9sNX0",
"outputId": "e23314b1-3fd5-4b13-f486-3f3a2c883bf8"
},
"source": [
"plt.scatter(X, y, color = 'red')\n",
"plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')\n",
"plt.title('Thedot - in')\n",
"plt.xlabel('Position level')\n",
"plt.ylabel('Salary')\n",
"plt.show()"
],
"execution_count": 62,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PqOHEqRnsesG"
},
"source": [
"**Let's us Test the model**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "jwpELH9_sh1m",
"outputId": "95b4e29e-8ca4-4b08-946c-fa4d18105b38"
},
"source": [
"lin_reg_2.predict(poly_reg.fit_transform([[8.5]]))"
],
"execution_count": 63,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([327206.85564436])"
]
},
"metadata": {
"tags": []
},
"execution_count": 63
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "l9IGynkE1RKO"
},
"source": [
"**Trained Polynomial Regression with correct degrees**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NoeqPEb01UNm",
"outputId": "845d6b01-c4b8-4940-f433-d2c732c7c964"
},
"source": [
"from sklearn.linear_model import LinearRegression\n",
"lin_reg = LinearRegression()\n",
"lin_reg.fit(X, y)\n",
"from sklearn.preprocessing import PolynomialFeatures\n",
"poly_reg = PolynomialFeatures(degree = 5)\n",
"X_poly = poly_reg.fit_transform(X)\n",
"lin_reg_2 = LinearRegression()\n",
"lin_reg_2.fit(X_poly, y)"
],
"execution_count": 64,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)"
]
},
"metadata": {
"tags": []
},
"execution_count": 64
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 295
},
"id": "zI91f8ju1XAf",
"outputId": "82231b02-1c68-4795-fc39-891082ccab7c"
},
"source": [
"plt.scatter(X, y, color = 'red')\n",
"plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')\n",
"plt.title('Thedot - in')\n",
"plt.xlabel('Position level')\n",
"plt.ylabel('Salary')\n",
"plt.show()"
],
"execution_count": 65,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment