-
-
Save sagarmainkar/41d135a04d7d3bc4098f0664fe20cf3c to your computer and use it in GitHub Desktop.
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T03:56:23.993097Z", | |
"start_time": "2018-08-25T03:56:23.138209Z" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"import matplotlib.pyplot as plt\n", | |
"%matplotlib inline\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T03:56:24.006246Z", | |
"start_time": "2018-08-25T03:56:23.998204Z" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"plt.style.use(['ggplot'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Create Data" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<h5> Generate some data with:\n", | |
"\\begin{equation} \\theta_0= 4 \\end{equation} \n", | |
"\\begin{equation} \\theta_1= 3 \\end{equation} \n", | |
"\n", | |
"Add some Gaussian noise to the data" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T03:56:24.027559Z", | |
"start_time": "2018-08-25T03:56:24.016052Z" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"X = 2 * np.random.rand(100,1)\n", | |
"y = 4 +3 * X+np.random.randn(100,1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's plot our data to check the relation between X and Y" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T03:56:24.780299Z", | |
"start_time": "2018-08-25T03:56:24.032204Z" | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"\n", | |
"plt.plot(X,y,'b.')\n", | |
"plt.xlabel(\"$x$\", fontsize=18)\n", | |
"plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n", | |
"_ =plt.axis([0,2,0,15])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Analytical way of Linear Regression" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T03:56:24.792752Z", | |
"start_time": "2018-08-25T03:56:24.783380Z" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[[4.14660597]\n", | |
" [2.87479146]]\n" | |
] | |
} | |
], | |
"source": [ | |
"X_b = np.c_[np.ones((100,1)),X]\n", | |
"theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)\n", | |
"print(theta_best)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<h5>This is close to our real thetas 4 and 3. It cannot be accurate due to the noise I have introduced in data" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T03:56:24.805370Z", | |
"start_time": "2018-08-25T03:56:24.795517Z" | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([[4.14660597],\n", | |
" [9.89618888]])" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"X_new = np.array([[0],[2]])\n", | |
"X_new_b = np.c_[np.ones((2,1)),X_new]\n", | |
"y_predict = X_new_b.dot(theta_best)\n", | |
"y_predict" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<h5>Let's plot prediction line with calculated:theta" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T03:56:25.168502Z", | |
"start_time": "2018-08-25T03:56:24.808124Z" | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[0, 2, 0, 15]" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
}, | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"plt.plot(X_new,y_predict,'r-')\n", | |
"plt.plot(X,y,'b.')\n", | |
"plt.xlabel(\"$x_1$\", fontsize=18)\n", | |
"plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n", | |
"plt.axis([0,2,0,15])\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Gradient Descent" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Cost Function & Gradients\n", | |
"\n", | |
"<h4> The equation for calculating cost function and gradients are as shown below. Please note the cost function is for Linear regression. For other algorithms the cost function will be different and the gradients would have to be derived from the cost functions" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"\n", | |
"\n", | |
"<b>Cost</b>\n", | |
"\\begin{equation}\n", | |
"J(\\theta) = 1/2m \\sum_{i=1}^{m} (h(\\theta)^{(i)} - y^{(i)})^2 \n", | |
"\\end{equation}\n", | |
"\n", | |
"<b>Gradient</b>\n", | |
"\n", | |
"\\begin{equation}\n", | |
"\\frac{\\partial J(\\theta)}{\\partial \\theta_j} = 1/m\\sum_{i=1}^{m}(h(\\theta^{(i)} - y^{(i)}).X_j^{(i)}\n", | |
"\\end{equation}\n", | |
"\n", | |
"<b>Gradients</b>\n", | |
"\\begin{equation}\n", | |
"\\theta_0: = \\theta_0 -\\alpha . (1/m .\\sum_{i=1}^{m}(h(\\theta^{(i)} - y^{(i)}).X_0^{(i)})\n", | |
"\\end{equation}\n", | |
"\\begin{equation}\n", | |
"\\theta_1: = \\theta_1 -\\alpha . (1/m .\\sum_{i=1}^{m}(h(\\theta^{(i)} - y^{(i)}).X_1^{(i)})\n", | |
"\\end{equation}\n", | |
"\\begin{equation}\n", | |
"\\theta_2: = \\theta_2 -\\alpha . (1/m .\\sum_{i=1}^{m}(h(\\theta^{(i)} - y^{(i)}).X_2^{(i)})\n", | |
"\\end{equation}\n", | |
"\n", | |
"\\begin{equation}\n", | |
"\\theta_j: = \\theta_j -\\alpha . (1/m .\\sum_{i=1}^{m}(h(\\theta^{(i)} - y^{(i)}).X_0^{(i)})\n", | |
"\\end{equation}" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T03:56:25.180630Z", | |
"start_time": "2018-08-25T03:56:25.172904Z" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"\n", | |
"def cal_cost(theta,X,y):\n", | |
" '''\n", | |
" \n", | |
" Calculates the cost for given X and Y. The following shows and example of a single dimensional X\n", | |
" theta = Vector of thetas \n", | |
" X = Row of X's np.zeros((2,j))\n", | |
" y = Actual y's np.zeros((2,1))\n", | |
" \n", | |
" where:\n", | |
" j is the no of features\n", | |
" '''\n", | |
" \n", | |
" m = len(y)\n", | |
" \n", | |
" predictions = X.dot(theta)\n", | |
" cost = (1/2*m) * np.sum(np.square(predictions-y))\n", | |
" return cost\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T03:56:25.204130Z", | |
"start_time": "2018-08-25T03:56:25.188272Z" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"def gradient_descent(X,y,theta,learning_rate=0.01,iterations=100):\n", | |
" '''\n", | |
" X = Matrix of X with added bias units\n", | |
" y = Vector of Y\n", | |
" theta=Vector of thetas np.random.randn(j,1)\n", | |
" learning_rate \n", | |
" iterations = no of iterations\n", | |
" \n", | |
" Returns the final theta vector and array of cost history over no of iterations\n", | |
" '''\n", | |
" m = len(y)\n", | |
" cost_history = np.zeros(iterations)\n", | |
" theta_history = np.zeros((iterations,2))\n", | |
" for it in range(iterations):\n", | |
" \n", | |
" prediction = np.dot(X,theta)\n", | |
" \n", | |
" theta = theta -(1/m)*learning_rate*( X.T.dot((prediction - y)))\n", | |
" theta_history[it,:] =theta.T\n", | |
" cost_history[it] = cal_cost(theta,X,y)\n", | |
" \n", | |
" return theta, cost_history, theta_history\n", | |
" \n", | |
" \n", | |
" " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<h3> Let's start with 1000 iterations and a learning rate of 0.01. Start with theta from a Gaussian distribution" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T03:56:27.126960Z", | |
"start_time": "2018-08-25T03:56:27.062811Z" | |
}, | |
"scrolled": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Theta0: 4.117,\n", | |
"Theta1: 2.899\n", | |
"Final cost/MSE: 5208.660\n" | |
] | |
} | |
], | |
"source": [ | |
"lr =0.01\n", | |
"n_iter = 1000\n", | |
"\n", | |
"theta = np.random.randn(2,1)\n", | |
"\n", | |
"X_b = np.c_[np.ones((len(X),1)),X]\n", | |
"theta,cost_history,theta_history = gradient_descent(X_b,y,theta,lr,n_iter)\n", | |
"\n", | |
"\n", | |
"print('Theta0: {:0.3f},\\nTheta1: {:0.3f}'.format(theta[0][0],theta[1][0]))\n", | |
"print('Final cost/MSE: {:0.3f}'.format(cost_history[-1]))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<h3> Let's plot the cost history over iterations" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T04:01:31.912400Z", | |
"start_time": "2018-08-25T04:01:31.604459Z" | |
}, | |
"scrolled": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<Figure size 864x576 with 1 Axes>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"fig,ax = plt.subplots(figsize=(12,8))\n", | |
"\n", | |
"ax.set_ylabel('J(Theta)')\n", | |
"ax.set_xlabel('Iterations')\n", | |
"_=ax.plot(range(n_iter),cost_history,'b.')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<h3> After around 150 iterations the cost is flat so the remaining iterations are not needed or will not result in any further optimization. Let us zoom in till iteration 200 and see the curve" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-19T05:22:30.561842Z", | |
"start_time": "2018-08-19T05:22:30.371532Z" | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<Figure size 720x576 with 1 Axes>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"\n", | |
"fig,ax = plt.subplots(figsize=(10,8))\n", | |
"_=ax.plot(range(200),cost_history[:200],'b.')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<b>It is worth while to note that the cost drops faster initially and then the gain in cost reduction is not as much" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### It would be great to see the effect of different learning rates and iterations together" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Let us build a function which can show the effects together and also show how gradient decent actually is working" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T04:06:54.948007Z", | |
"start_time": "2018-08-25T04:06:54.916435Z" | |
}, | |
"scrolled": false | |
}, | |
"outputs": [], | |
"source": [ | |
"\n", | |
"def plot_GD(n_iter,lr,ax,ax1=None):\n", | |
" \"\"\"\n", | |
" n_iter = no of iterations\n", | |
" lr = Learning Rate\n", | |
" ax = Axis to plot the Gradient Descent\n", | |
" ax1 = Axis to plot cost_history vs Iterations plot\n", | |
"\n", | |
" \"\"\"\n", | |
" _ = ax.plot(X,y,'b.')\n", | |
" theta = np.random.randn(2,1)\n", | |
"\n", | |
" tr =0.1\n", | |
" cost_history = np.zeros(n_iter)\n", | |
" for i in range(n_iter):\n", | |
" pred_prev = X_b.dot(theta)\n", | |
" theta,h,_ = gradient_descent(X_b,y,theta,lr,1)\n", | |
" pred = X_b.dot(theta)\n", | |
"\n", | |
" cost_history[i] = h[0]\n", | |
"\n", | |
" if ((i % 25 == 0) ):\n", | |
" _ = ax.plot(X,pred,'r-',alpha=tr)\n", | |
" if tr < 0.8:\n", | |
" tr = tr+0.2\n", | |
" if not ax1== None:\n", | |
" _ = ax1.plot(range(n_iter),cost_history,'b.') " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Plot the graphs for different iterations and learning rates combination" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2018-08-25T04:07:09.977910Z", | |
"start_time": "2018-08-25T04:07:00.484794Z" | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { |
@JayRenData
I think the reason you can't get it is because of the integer divisions in the code.
if you replace m = len(y) by float(len(y)) in line 8
and do the same in line 9 the everything should work fine.
I need your help: f(x)=x*x+2x+5
Generate some data with: theta0=5,theta1=2,theta3=1
i don't know how to make it !
I need your help: f(x)=x*x+2x+5
Generate some data with: theta0=5,theta1=2,theta3=1
i don't know how to make it !
It is quite simple refer to this in above code
X = 2 * np.random.rand(100,1)
y = 4 +3 * X+np.random.randn(100,1)
So you can do
X = can be any number randomly generated ( np.random.rand(100,1))
Y = 5+ XX + 2X
There is no need for theta3
The result is not matching with the initial values in the mini batch gradient descent.
You gave 4,3 but the values coming out are 3.6,3 for you. So i think this is the best local gradient descent you arrived.
But I arrived at 4.0 and 2.9 with your code(psted and tested) and my code as well.
Despite these things I learnt many things here.
Thank you very much.
I need your help: f(x,y) = xx + yy
Hi,
There seems to be a flaw in the cost function
cost = (1/2*m) * np.sum(np.square(predictions-y))
Shouldn't it be
cost = 1/(2*m) * np.sum(np.square(predictions-y))
Nice walkthrough
I was just about to make the same observation as sivi299 regarding the cost function.
In this case, since m is fixed from iteration to iteration when doing the gradient descent, I don't think it matters when it comes to optimizing the theta variable. As written, it's proportional to the mean-squared error, but it should optimize towards the same theta all the same.
The relative magnitudes of the cost function history curves differ between the gradient_descent and minibatch_gradient_descent due to different batch sizes when the cal_cost function is called, but since each algorithm uses the same number of points from iteration to iteration internally it should be OK.
Hi, this is fantastic material; thanks so much.
I think there is a typo in equation (8). Shouldn't the X's subindex be j? Meaning X_j instead of X_0?
Regards,
I've output 'theta_history', 'cost_history' and 'x', 'y', but can not get the same 'cost' as the output by the program above