DiegoHernanSalazar · April 15, 2025 00:12
diff --git a/C3_W2_Collaborative_RecSys_Assignment.ipynb b/C3_W2_Collaborative_RecSys_Assignment.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Lzk7iX_CodX6",
    "tags": []
   },
   "source": [
    "# <img align=\"left\" src=\"./images/movie_camera.png\"     style=\" width:40px;  \" > Practice lab: Collaborative Filtering Recommender Systems\n",
    "\n",
    "In this exercise, you will implement collaborative filtering to build a recommender system for movies. \n",
    "\n",
    "# <img align=\"left\" src=\"./images/film_reel.png\"     style=\" width:40px;  \" > Outline\n",
    "- [ 1 - Notation](#1)\n",
    "- [ 2 - Recommender Systems](#2)\n",
    "- [ 3 - Movie ratings dataset](#3)\n",
    "- [ 4 - Collaborative filtering learning algorithm](#4)\n",
    "  - [ 4.1 Collaborative filtering cost function](#4.1)\n",
    "    - [ Exercise 1](#ex01)\n",
    "- [ 5 - Learning movie recommendations](#5)\n",
    "- [ 6 - Recommendations](#6)\n",
    "- [ 7 - Congratulations!](#7)\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_**NOTE:** To prevent errors from the autograder, you are not allowed to edit or delete non-graded cells in this lab. Please also refrain from adding any new cells. \n",
    "**Once you have passed this assignment** and want to experiment with any of the non-graded code, you may follow the instructions at the bottom of this notebook._"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Packages <img align=\"left\" src=\"./images/film_strip_vertical.png\"     style=\" width:40px;   \" >\n",
    "We will use the now familiar NumPy and Tensorflow Packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "deletable": false
   },
   "outputs": [],
   "source": [
    "import numpy as np           # Get numpy 'np' constructor for arrays handling and numeric computations\n",
    "import tensorflow as tf      # Import 'TensorFlow' library as 'tf' constructor\n",
    "from tensorflow import keras # Import 'keras' library\n",
    "from recsys_utils import *   # Contains ALL (*) helper functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"1\"></a>\n",
    "## 1 - Notation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "|General <br />  Notation  | Description| Python (if any) |\n",
    "|:-------------|:------------------------------------------------------------||\n",
    "| $r(i,j)$     | scalar; = 1  if user j rated movie i  = 0  otherwise             ||\n",
    "| $y(i,j)$     | scalar; = rating given by user j on movie  i    (if r(i,j) = 1 is defined) ||\n",
    "|$\\mathbf{w}^{(j)}$ | vector; parameters for user j ||\n",
    "|$b^{(j)}$     |  scalar; parameter for user j ||\n",
    "| $\\mathbf{x}^{(i)}$ |   vector; feature ratings for movie i        ||     \n",
    "| $n_u$        | number of users |num_users|\n",
    "| $n_m$        | number of movies | num_movies |\n",
    "| $n$          | number of features | num_features                    |\n",
    "| $\\mathbf{X}$ |  matrix of vectors $\\mathbf{x}^{(i)}$         | X |\n",
    "| $\\mathbf{W}$ |  matrix of vectors $\\mathbf{w}^{(j)}$         | W |\n",
    "| $\\mathbf{b}$ |  vector of bias parameters $b^{(j)}$ | b |\n",
    "| $\\mathbf{R}$ | matrix of elements $r(i,j)$                    | R |\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a name=\"2\"></a>\n",
    "## 2 - Recommender Systems <img align=\"left\" src=\"./images/film_rating.png\" style=\" width:40px;  \" >\n",
    "In this lab, you will implement the collaborative filtering learning algorithm and apply it to a dataset of movie ratings.\n",
    "The goal of a collaborative filtering recommender system is to generate two vectors: For each user, a 'parameter vector' that embodies the movie tastes of a user. For each movie, a feature vector of the same size which embodies some description of the movie. The dot product of the two vectors plus the bias term should produce an estimate of the rating the user might give to that movie.\n",
    "\n",
    "The diagram below details how these vectors are learned."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<figure>\n",
    "   <img src=\"./images/ColabFilterLearn.PNG\"  style=\"width:740px;height:250px;\" >\n",
    "</figure>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Existing ratings are provided in matrix form as shown. $Y$ contains ratings; 0.5 to 5 inclusive in 0.5 steps. 0 if the movie has not been rated. $R$ has a 1 where movies have been rated. Movies are in rows, users in columns. Each user has a parameter vector $w^{user}$ and bias. Each movie has a feature vector $x^{movie}$. These vectors are simultaneously learned by using the existing user/movie ratings as training data. One training example is shown above: $\\mathbf{w}^{(1)} \\cdot \\mathbf{x}^{(1)} + b^{(1)} = 4$. It is worth noting that the feature vector $x^{movie}$ must satisfy all the users while the user vector $w^{user}$ must satisfy all the movies. This is the source of the name of this approach - all the users collaborate to generate the rating set. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<figure>\n",
    "   <img src=\"./images/ColabFilterUse.PNG\"  style=\"width:640px;height:250px;\" >\n",
    "</figure>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the feature vectors and parameters are learned, they can be used to predict how a user might rate an unrated movie. This is shown in the diagram above. The equation is an example of predicting a rating for user one on movie zero."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "In this exercise, you will implement the function `cofiCostFunc` that computes the collaborative filtering\n",
    "objective function. After implementing the objective function, you will use a TensorFlow custom training loop to learn the parameters for collaborative filtering. The first step is to detail the data set and data structures that will be used in the lab."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6-09Hto6odYD"
   },
   "source": [
    "<a name=\"3\"></a>\n",
    "## 3 - Movie ratings dataset <img align=\"left\" src=\"./images/film_rating.png\"     style=\" width:40px;  \" >\n",
    "The data set is derived from the [MovieLens \"ml-latest-small\"](https://grouplens.org/datasets/movielens/latest/) dataset.   \n",
    "[F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>]\n",
    "\n",
    "The original dataset has  9000 movies rated by 600 users. The dataset has been reduced in size to focus on movies from the years since 2000. This dataset consists of ratings on a scale of 0.5 to 5 in 0.5 step increments. The reduced dataset has $n_u = 443$ users, and $n_m= 4778$ movies. \n",
    "\n",
    "Below, you will load the movie dataset into the variables $Y$ and $R$.\n",
    "\n",
    "The matrix $Y$ (a  $n_m \\times n_u$ matrix) stores the ratings $y^{(i,j)}$. The matrix $R$ is an binary-valued indicator matrix, where $R(i,j) = 1$ if user $j$ gave a rating to movie $i$, and $R(i,j)=0$ otherwise. \n",
    "\n",
    "Throughout this part of the exercise, you will also be working with the\n",
    "matrices, $\\mathbf{X}$, $\\mathbf{W}$ and $\\mathbf{b}$: \n",
    "\n",
    "$$\\mathbf{X} = \n",
    "\\begin{bmatrix}\n",
    "--- (\\mathbf{x}^{(0)})^T --- \\\\\n",
    "--- (\\mathbf{x}^{(1)})^T --- \\\\\n",
    "\\vdots \\\\\n",
    "--- (\\mathbf{x}^{(n_m-1)})^T --- \\\\\n",
    "\\end{bmatrix} , \\quad\n",
    "\\mathbf{W} = \n",
    "\\begin{bmatrix}\n",
    "--- (\\mathbf{w}^{(0)})^T --- \\\\\n",
    "--- (\\mathbf{w}^{(1)})^T --- \\\\\n",
    "\\vdots \\\\\n",
    "--- (\\mathbf{w}^{(n_u-1)})^T --- \\\\\n",
    "\\end{bmatrix},\\quad\n",
    "\\mathbf{ b} = \n",
    "\\begin{bmatrix}\n",
    " b^{(0)}  \\\\\n",
    " b^{(1)} \\\\\n",
    "\\vdots \\\\\n",
    "b^{(n_u-1)} \\\\\n",
    "\\end{bmatrix}\\quad\n",
    "$$ \n",
    "\n",
    "The $i$-th row of $\\mathbf{X}$ corresponds to the\n",
    "feature vector $x^{(i)}$ for the $i$-th movie, and the $j$-th row of\n",
    "$\\mathbf{W}$ corresponds to one parameter vector $\\mathbf{w}^{(j)}$, for the\n",
    "$j$-th user. Both $x^{(i)}$ and $\\mathbf{w}^{(j)}$ are $n$-dimensional\n",
    "vectors. For the purposes of this exercise, you will use $n=10$, and\n",
    "therefore, $\\mathbf{x}^{(i)}$ and $\\mathbf{w}^{(j)}$ have 10 elements.\n",
    "Correspondingly, $\\mathbf{X}$ is a\n",
    "$n_m \\times 10$ matrix and $\\mathbf{W}$ is a $n_u \\times 10$ matrix.\n",
    "\n",
    "We will start by loading the movie ratings dataset to understand the structure of the data.\n",
    "We will load $Y$ and $R$ with the movie dataset.  \n",
    "We'll also load $\\mathbf{X}$, $\\mathbf{W}$, and $\\mathbf{b}$ with pre-computed values. These values will be learned later in the lab, but we'll use pre-computed values to develop the cost model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "deletable": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Y (4778, 443) R (4778, 443)\n",
      "X (4778, 10)\n",
      "W (443, 10)\n",
      "b (1, 443)\n",
      "num_features 10\n",
      "num_movies 4778\n",
      "num_users 443\n"
     ]
    }
   ],
   "source": [
    "# Load data using 'load_precalc_params_small()' and 'load_ratings_small()' helper functions \n",
    "X, W, b, num_movies, num_features, num_users = load_precalc_params_small()\n",
    "Y, R = load_ratings_small()\n",
    "\n",
    "# Display Y(4778 rows, 443 cols) ratings size and R(4778 rows, 443 cols) binary values (size), \n",
    "# specifying if a movie was rated '1' or not '0' by users.\n",
    "print(\"Y\", Y.shape, \"R\", R.shape)\n",
    "\n",
    "# Display X(4778 rows/movies, 10 features per movie)\n",
    "print(\"X\", X.shape)\n",
    "\n",
    "# Display W(443 rows/users, 10 features per user)\n",
    "print(\"W\", W.shape)\n",
    "\n",
    "# Display b(1 row, 443 cols/users)\n",
    "print(\"b\", b.shape)\n",
    "\n",
    "# n = 10 features at each movie\n",
    "print(\"num_features\", num_features)\n",
    "\n",
    "# nm = 4778 movies\n",
    "print(\"num_movies\",   num_movies)\n",
    "\n",
    "# nu = 443 users\n",
    "print(\"num_users\",    num_users)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "deletable": false,
    "id": "bxm1O_wbodYF"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Y[Trues]= [5. 2. 4. 4. 2.]\n",
      "Average rating for movie 1 : 3.4 stars / 5 stars \n"
     ]
    }
   ],
   "source": [
    "#  From the matrix, we can compute statistics like average rating.\n",
    "# R[0,:] row 0, select ALL cols -> [0., 0., 1., 0., ... ,0.]\n",
    "# R[0, :].astype(bool) -> [False, False, True, False, ... , False]\n",
    "trues_and_falses = R[0, :].astype(bool) \n",
    "\n",
    "# At 1st row of Y, pass the array of 'True' and 'False' boolean values,\n",
    "# to select just the Y[True] values and get the mean of them. Y[False] are NOT selected\n",
    "# Y[Trues]= [5. 2. 4. 4. 2.]\n",
    "print(\"Y[Trues]=\",Y[0, trues_and_falses])\n",
    "\n",
    "# 1st row of rating mean = 5. + 2. + 4. + 4. + 2. = 17. / 5 = 3.4 stars out of 5 stars\n",
    "tsmean =  np.mean(Y[0, trues_and_falses])\n",
    "print(f\"Average rating for movie 1 : {tsmean:0.1f} stars / 5 stars \" )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"4\"></a>\n",
    "## 4 - Collaborative filtering learning algorithm <img align=\"left\" src=\"./images/film_filter.png\"     style=\" width:40px;  \" >\n",
    "\n",
    "Now, you will begin implementing the collaborative filtering learning\n",
    "algorithm. You will start by implementing the objective function. \n",
    "\n",
    "The collaborative filtering algorithm in the setting of movie\n",
    "recommendations considers a set of $n$-dimensional parameter vectors\n",
    "$\\mathbf{x}^{(0)},...,\\mathbf{x}^{(n_m-1)}$, $\\mathbf{w}^{(0)},...,\\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$, where the\n",
    "model predicts the rating for movie $i$ by user $j$ as\n",
    "$y^{(i,j)} = \\mathbf{w}^{(j)}\\cdot \\mathbf{x}^{(i)} + b^{(j)}$ . Given a dataset that consists of\n",
    "a set of ratings produced by some users on some movies, you wish to\n",
    "learn the parameter vectors $\\mathbf{x}^{(0)},...,\\mathbf{x}^{(n_m-1)},\n",
    "\\mathbf{w}^{(0)},...,\\mathbf{w}^{(n_u-1)}$  and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes\n",
    "the squared error).\n",
    "\n",
    "You will complete the code in cofiCostFunc to compute the cost\n",
    "function for collaborative filtering. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bcqg0LJWodYH"
   },
   "source": [
    "\n",
    "<a name=\"4.1\"></a>\n",
    "### 4.1 Collaborative filtering cost function\n",
    "\n",
    "The collaborative filtering cost function is given by\n",
    "$$J({\\mathbf{x}^{(0)},...,\\mathbf{x}^{(n_m-1)},\\mathbf{w}^{(0)},b^{(0)},...,\\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \\left[ \\frac{1}{2}\\sum_{(i,j):r(i,j)=1}(\\mathbf{w}^{(j)} \\cdot \\mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \\right]\n",
    "+ \\underbrace{\\left[\n",
    "\\frac{\\lambda}{2}\n",
    "\\sum_{j=0}^{n_u-1}\\sum_{k=0}^{n-1}(\\mathbf{w}^{(j)}_k)^2\n",
    "+ \\frac{\\lambda}{2}\\sum_{i=0}^{n_m-1}\\sum_{k=0}^{n-1}(\\mathbf{x}_k^{(i)})^2\n",
    "\\right]}_{regularization}\n",
    "\\tag{1}$$\n",
    "The first summation in (1) is \"for all $i$, $j$ where $r(i,j)$ equals $1$\" and could be written:\n",
    "\n",
    "$$\n",
    "= \\left[ \\frac{1}{2}\\sum_{j=0}^{n_u-1} \\sum_{i=0}^{n_m-1}r(i,j)*(\\mathbf{w}^{(j)} \\cdot \\mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \\right]\n",
    "+\\text{regularization}\n",
    "$$\n",
    "\n",
    "You should now write cofiCostFunc (collaborative filtering cost function) to return this cost."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"ex01\"></a>\n",
    "### Exercise 1\n",
    "\n",
    "**For loop Implementation:**   \n",
    "Start by implementing the cost function using for loops.\n",
    "Consider developing the cost function in two steps. First, develop the cost function without regularization. A test case that does not include regularization is provided below to test your implementation. Once that is working, add regularization and run the tests that include regularization.  Note that you should be accumulating the cost for user $j$ and movie $i$ only if $R(i,j) = 1$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "deletable": false
   },
   "outputs": [],
   "source": [
    "# EXERCISE 1\n",
    "# GRADED FUNCTION: cofi_cost_func\n",
    "# UNQ_C1\n",
    "\n",
    "def cofi_cost_func(X, W, b, Y, R, lambda_):\n",
    "    \"\"\"\n",
    "    Returns the cost for the content-based filtering\n",
    "    Args:\n",
    "      X (ndarray (num_movies,num_features)): matrix of item features   [[x^(i)]n features]\n",
    "      W (ndarray (num_users,num_features)) : matrix of user parameters [[w^(j)]n features]\n",
    "      b (ndarray (1, num_users)            : matrix of user parameters [[b^(j)]nu]\n",
    "      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies [[y^(i,j)]nu]\n",
    "      R (ndarray (num_movies,num_users)    : matrix, where R(i, j)=1 if the i-th movies was rated by the j-th user [[r(i,j)]nu]\n",
    "      lambda_ (float): regularization parameter (scalar)\n",
    "    Returns:\n",
    "      J (float) : Cost\n",
    "    \"\"\"\n",
    "    nm, nu = Y.shape         # (rows=nm=4778, cols=nu=443)\n",
    "    J = 0                    # Initialize COST function as J = 0\n",
    "    \n",
    "    ### START CODE HERE ###  \n",
    "    \n",
    "    for i in range(nm):     # Iterate through each movie i = 0,1,2,...,nm = 4778 - 1\n",
    "        \n",
    "        x = X[i,:]          # Select each row/movie i vector -> X[i, ALL 10 cols/users tastes/features]\n",
    "        \n",
    "        for j in range(nu): # Iterate through each user j = 0,1,2,...,nu = 443 - 1\n",
    "            \n",
    "            w = W[j,:]      # Select each user j vector -> W[j, ALL 10 cols/movies tastes/features]\n",
    "            b_j = b[0,j]    # Select at unique vector [[0]], each user j bias element -> b[Unique vector 0, j user bias]\n",
    "            y = Y[i,j]      # Select rating for each movie i, each user j -> Y[each movie i, each user j]\n",
    "            r = R[i,j]      # Select boolean rate indicator. rated=1, non-rated=0 \n",
    "                            # for each movie i, ALL users j -> R[each movie i, was rated (1) or not (0) by user j]\n",
    "            \n",
    "            J += np.square(r * (np.dot(w,x) + b_j - y ) )          # COST J normal Term. w,x are 1D vectors. \n",
    "                                                                   # b_j,y are values (ALL iterated). \n",
    "                                                                   # Jnew = Jprev + [r * (w.x + b - y) ]2\n",
    "                                                                   #     (init as 0)\n",
    "                    \n",
    "    # 'np.square(W)' or 'np.square(X)' with W or X a 2D NumPy array, calculates the square of each element in the array \n",
    "    # (element by element). The result is a new array with the same dimensions as the original array W or X, \n",
    "    # where each element is the square of the corresponding element in W or X.\n",
    "    # 'np.sum(W^2)' or 'np.sum(X^2)' sums all the elements of a 2D NumPy array W^2 or X^2. \n",
    "    # This sums all the values within the array, considering axis=None as the default.\n",
    "    J += (lambda_) * (np.sum(np.square(W)) + np.sum(np.square(X))) # COST J Regularization Terms \n",
    "                                                                   # (W and X are 2D MATRICES NOT iterated)\n",
    "                                                                   # Jnew = Jprev + (lambda)*[SUM(W^2) + SUM(X^2)]\n",
    "    J = J/2                                                        # Divide final COST J = J / 2\n",
    "              \n",
    "    ### END CODE HERE ### \n",
    "\n",
    "    return J # Return COST J(w,b,x) = Jprev + (1/2) * np.square(r * (np.dot(w,x) + b_j - y ) ) + \n",
    "             #                        (lambda_/2) * [np.sum(np.square(W)) + np.sum(np.square(X))]  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "  <summary><font size=\"3\" color=\"darkgreen\"><b>Click for hints</b></font></summary>\n",
    "    You can structure the code in two for loops similar to the summation in (1).   \n",
    "    Implement the code without regularization first.   \n",
    "    Note that some of the elements in (1) are vectors. Use np.dot(). You can also use np.square().\n",
    "    Pay close attention to which elements are indexed by i and which are indexed by j. Don't forget to divide by two.\n",
    "    \n",
    "```python     \n",
    "    ### START CODE HERE ###  \n",
    "    for j in range(nu):\n",
    "        \n",
    "        \n",
    "        for i in range(nm):\n",
    "            \n",
    "            \n",
    "    ### END CODE HERE ### \n",
    "```    \n",
    "<details>\n",
    "    <summary><font size=\"2\" color=\"darkblue\"><b> Click for more hints</b></font></summary>\n",
    "        \n",
    "    Here is some more details. The code below pulls out each element from the matrix before using it. \n",
    "    One could also reference the matrix directly.  \n",
    "    This code does not contain regularization.\n",
    "    \n",
    "```python \n",
    "    nm,nu = Y.shape\n",
    "    J = 0\n",
    "    ### START CODE HERE ###  \n",
    "    for j in range(nu):\n",
    "        w = W[j,:]\n",
    "        b_j = b[0,j]\n",
    "        for i in range(nm):\n",
    "            x = \n",
    "            y = \n",
    "            r =\n",
    "            J += \n",
    "    J = J/2\n",
    "    ### END CODE HERE ### \n",
    "\n",
    "```\n",
    "    \n",
    "<details>\n",
    "    <summary><font size=\"2\" color=\"darkblue\"><b>Last Resort (full non-regularized implementation)</b></font></summary>\n",
    "    \n",
    "```python \n",
    "    nm,nu = Y.shape\n",
    "    J = 0\n",
    "    ### START CODE HERE ###  \n",
    "    for j in range(nu):\n",
    "        w = W[j,:]\n",
    "        b_j = b[0,j]\n",
    "        for i in range(nm):\n",
    "            x = X[i,:]\n",
    "            y = Y[i,j]\n",
    "            r = R[i,j]\n",
    "            J += np.square(r * (np.dot(w,x) + b_j - y ) )\n",
    "    J = J/2\n",
    "    ### END CODE HERE ### \n",
    "```\n",
    "    \n",
    "<details>\n",
    "    <summary><font size=\"2\" color=\"darkblue\"><b>regularization</b></font></summary>\n",
    "     Regularization just squares each element of the W array and X array and them sums all the squared elements.\n",
    "     You can utilize np.square() and np.sum().\n",
    "\n",
    "<details>\n",
    "    <summary><font size=\"2\" color=\"darkblue\"><b>regularization details</b></font></summary>\n",
    "    \n",
    "```python \n",
    "    J += (lambda_/2) * (np.sum(np.square(W)) + np.sum(np.square(X)))\n",
    "```\n",
    "    \n",
    "</details>\n",
    "</details>\n",
    "</details>\n",
    "</details>\n",
    "\n",
    "    \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "deletable": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cost: 13.67\n"
     ]
    }
   ],
   "source": [
    "# Reduce the data set size so that this runs faster\n",
    "# Selected reduced users nu = 4\n",
    "num_users_r = 4\n",
    "\n",
    "# Selected reduced movies nm = 5\n",
    "num_movies_r = 5\n",
    "\n",
    "# Selected reduced features n = 3\n",
    "num_features_r = 3\n",
    "\n",
    "# X_reduced = X[0:5 rows/movies, 0:3 cols/features]\n",
    "X_r = X[:num_movies_r, :num_features_r]\n",
    "\n",
    "# W_reduced = W[0:4 rows/users, 0:3 cols/features]\n",
    "W_r = W[:num_users_r,  :num_features_r]\n",
    "\n",
    "# b_reduced = b[0 unique vector, 0:4 user's biases]\n",
    "# '.reshape(1,-1)' Converts b=[1D] -> b = [[2D]]\n",
    "b_r = b[0, :num_users_r].reshape(1,-1) \n",
    "\n",
    "# Y_reduced = Y[0:5 rows/movies, 0:4 cols/users]\n",
    "Y_r = Y[:num_movies_r, :num_users_r]\n",
    "\n",
    "# R_reduced = R[0:5 rows/movies, 0:4 cols/users]\n",
    "R_r = R[:num_movies_r, :num_users_r]\n",
    "\n",
    "# Evaluate cost function calling Collaborative Filtering COST J 'cofi_cost_func()'\n",
    "# lambda = 0 -> NO REGULARIZATION Terms\n",
    "J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 0);\n",
    "\n",
    "# Print COST J without REGULARIZATION terms, as 'x.xx' float value\n",
    "print(f\"Cost: {J:0.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xGznmQ91odYL"
   },
   "source": [
    "**Expected Output (lambda = 0)**:  \n",
    "$13.67$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "deletable": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cost (with regularization): 28.09\n"
     ]
    }
   ],
   "source": [
    "# Evaluate cost function calling Collaborative Filtering COST J 'cofi_cost_func()' \n",
    "# with regularization. lambda = 1.5 -> Include REGULARIZATION Terms \n",
    "J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 1.5);\n",
    "\n",
    "# Print COST J with REGULARIZATION terms, as 'x.xx' float value\n",
    "print(f\"Cost (with regularization): {J:0.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1xbepzUUodYP"
   },
   "source": [
    "**Expected Output**:\n",
    "\n",
    "28.09"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "deletable": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[92mAll tests passed!\n"
     ]
    }
   ],
   "source": [
    "# Public tests. Import ALL (*)\n",
    "# 'public_tests' library modules\n",
    "from public_tests import *\n",
    "\n",
    "# TEST ‘cofi_cost_func()’ COLLABORATIVE  \n",
    "# FILTERING COST J(w,b,x) function\n",
    "test_cofi_cost_func(cofi_cost_func) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Vectorized Implementation**\n",
    "\n",
    "It is important to create a vectorized implementation to compute $J$, since it will later be called many times during optimization. The linear algebra utilized is not the focus of this series, so the implementation is provided. If you are an expert in linear algebra, feel free to create your version without referencing the code below. \n",
    "\n",
    "Run the code below and verify that it produces the same results as the non-vectorized version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "deletable": false
   },
   "outputs": [],
   "source": [
    "def cofi_cost_func_v(X, W, b, Y, R, lambda_):\n",
    "    \"\"\"\n",
    "    Returns the cost for the content-based filtering\n",
    "    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.\n",
    "    Args:\n",
    "      X (ndarray (num_movies,num_features)): matrix of item features   [[x^(i)]n features]\n",
    "      W (ndarray (num_users,num_features)) : matrix of user parameters [[w^(j)]n features]\n",
    "      b (ndarray (1, num_users)            : matrix of user parameters [[b^(j)]nu]\n",
    "      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies [[y^(i,j)]nu]\n",
    "      R (ndarray (num_movies,num_users)    : matrix, where R(i, j)=1 if the i-th movies was rated by the j-th user [[r(i,j)]nu]\n",
    "      lambda_ (float): regularization parameter\n",
    "    Returns:\n",
    "      J (float) : Cost\n",
    "    \"\"\"\n",
    "    # X =[x^(1)=[x1,x2,...,x10], \n",
    "    #     x^(2)=[x1,x2,...,x10],\n",
    "    #         ...\n",
    "    #     x^(i)=[x1,x2,...,x10]]           \n",
    "    #                                      w^(1) w^(2) ... w^(j)\n",
    "    # W =[w^(1)=[w1,w2,...,w10],      W^T =[[w1,  [w1,      [w1,     \n",
    "    #     w^(2)=[w1,w2,...,w10],             w2,   w2,       w2,\n",
    "    #         ...                            ...   ...       ...\n",
    "    #     w^(i)=[w1,w2,...,w10]]             w10], w10],     w10]]\n",
    "    \n",
    " # X.(W^T)=[ [(x1^(1)*w1^(1) + x2^(1)*w2^(1) +...+ x10^(1)*w10^(1)) ... (x1^(1)*w1^(j) + x2^(1)*w2^(j) +...+ x10^(1)*w10^(j))],\n",
    " #           [(x1^(2)*w1^(1) + x2^(2)*w2^(1) +...+ x10^(2)*w10^(1)) ... (x1^(2)*w1^(j) + x2^(2)*w2^(j) +...+ x10^(2)*w10^(j))],\n",
    " #                                                              ..........\n",
    " #           [(x1^(i)*w1^(1) + x2^(i)*w2^(1) +...+ x10^(i)*w10^(1)) ... (x1^(i)*w1^(j) + x2^(i)*w2^(j) +...+ x10^(i)*w10^(j))] ]\n",
    " # X.(W^T)[nm x nu] has the Same Size of Y[nm x nu] and R[nm x nu]. b[1 x nu]. \n",
    "\n",
    "    # j = R*([X.(W^T) + b] - Y) -> j is the NON-squared Error Term (Quasi-Normal Term)\n",
    "    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R    \n",
    "    \n",
    "    # The tf.reduce_sum(axis=None) function calculates the sum of all elements in a tensor. When axis=None, \n",
    "    # all dimensions of the tensor are reduced, resulting in a tensor with a single element representing the total sum \n",
    "    # of all values present in the original tensor. This is useful for obtaining the overall sum of a multidimensional dataset \n",
    "    # without needing to specify any particular dimensions.\n",
    "    \n",
    "    # Squared Error Term(Normal term)                    +  REGULARIZATION Terms\n",
    "    # (1/2)*SUM ([R*([X.(W^T) + b] - Y)]^2) + (lambda/2)*[SUM (X^2) + SUM (W^2)]  \n",
    "    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))\n",
    "    \n",
    "    return J # Return COST J(W,b,X) = (1/2)*SUM ([R*([X.(W^T) + b] - Y)]^2) + (lambda/2)*[SUM (X^2) + SUM (W^2)]\n",
    "             #                        Squared Error Term8Normal term)       +  REGULARIZATION Terms"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "deletable": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cost: 13.67\n",
      "Cost (with regularization): 28.09\n"
     ]
    }
   ],
   "source": [
    "# Evaluate cost function calling Collaborative Filtering VECTORIZED COST J 'cofi_cost_func()'\n",
    "# lambda = 0 -> NO REGULARIZATION Terms\n",
    "J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 0);\n",
    "\n",
    "# Print COST J with REGULARIZATION terms, as 'x.xx' float value\n",
    "print(f\"Cost: {J:0.2f}\")\n",
    "\n",
    "# Evaluate cost function calling Collaborative Filtering VECTORIZED COST J 'cofi_cost_func()' \n",
    "# with regularization. lambda = 1.5 -> Include REGULARIZATION Terms \n",
    "J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 1.5);\n",
    "\n",
    "# Print COST J with REGULARIZATION terms, as 'x.xx' float value\n",
    "print(f\"Cost (with regularization): {J:0.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1xbepzUUodYP"
   },
   "source": [
    "**Expected Output**:  \n",
    "Cost: 13.67  \n",
    "Cost (with regularization): 28.09"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ilaeM8yWodYR"
   },
   "source": [
    "<a name=\"5\"></a>\n",
    "## 5 - Learning movie recommendations <img align=\"left\" src=\"./images/film_man_action.png\" style=\" width:40px;  \" >\n",
    "------------------------------\n",
    "\n",
    "After you have finished implementing the collaborative filtering cost\n",
    "function, you can start training your algorithm to make\n",
    "movie recommendations for yourself. \n",
    "\n",
    "In the cell below, you can enter your own movie choices. The algorithm will then make recommendations for you! We have filled out some values according to our preferences, but after you have things working with our choices, you should change this to match your tastes.\n",
    "A list of all movies in the dataset is in the file [movie list](data/small_movie_list.csv)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "deletable": false,
    "id": "WJO8Jr0UodYR"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Changed values IDs -> i = [246, 366, 382, 622, 793, 929, 988, 1150, 2609, 2700, 2716, 2925, 2937]\n",
      "\n",
      "New user ratings for MOVIES:\n",
      "\n",
      "ID:246, New Rating:5.0, Name:Shrek (2001)\n",
      "ID:366, New Rating:5.0, Name:Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)\n",
      "ID:382, New Rating:2.0, Name:Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)\n",
      "ID:622, New Rating:5.0, Name:Harry Potter and the Chamber of Secrets (2002)\n",
      "ID:793, New Rating:5.0, Name:Pirates of the Caribbean: The Curse of the Black Pearl (2003)\n",
      "ID:929, New Rating:5.0, Name:Lord of the Rings: The Return of the King, The (2003)\n",
      "ID:988, New Rating:3.0, Name:Eternal Sunshine of the Spotless Mind (2004)\n",
      "ID:1150, New Rating:5.0, Name:Incredibles, The (2004)\n",
      "ID:2609, New Rating:2.0, Name:Persuasion (2007)\n",
      "ID:2700, New Rating:5.0, Name:Toy Story 3 (2010)\n",
      "ID:2716, New Rating:3.0, Name:Inception (2010)\n",
      "ID:2925, New Rating:1.0, Name:Louis Theroux: Law & Disorder (2008)\n",
      "ID:2937, New Rating:1.0, Name:Nothing to Declare (Rien à déclarer) (2010)\n"
     ]
    }
   ],
   "source": [
    "# Load 'movieList_df' dataframe with MOVIES columns info: \n",
    "# ID i-position, \"mean rating\", \"number of ratings\", \"title\" \n",
    "movieList, movieList_df = load_Movie_List_pd()\n",
    "\n",
    "# Initialize 'my ratings' list as ALL '0' -> [0,0,0,...,0]4778\n",
    "# with 4778 i positions, each with a rating value (1 per MOVIE).\n",
    "# A NEW USER j / col, gives 4778 ratings (1 per MOVIE). Next, We'll ADD this, as a new column array.\n",
    "my_ratings = np.zeros(num_movies)          \n",
    "\n",
    "# Check the file 'small_movie_list.csv' for id of each movie in our dataset\n",
    "# For example, Toy Story 3 (2010) has ID 2700, so to rate it \"5\", you can set\n",
    "my_ratings[2700] = 5 \n",
    "\n",
    "#Or suppose you did not enjoy Persuasion (2007), you can set\n",
    "my_ratings[2609] = 2;\n",
    "\n",
    "# We have selected a few movies we liked / did not like and the ratings we\n",
    "# gave are as follows:\n",
    "my_ratings[929]  = 5   # Lord of the Rings: The Return of the King, The\n",
    "my_ratings[246]  = 5   # Shrek (2001)\n",
    "my_ratings[2716] = 3   # Inception\n",
    "my_ratings[1150] = 5   # Incredibles, The (2004)\n",
    "my_ratings[382]  = 2   # Amelie (Fabuleux destin d'Amélie Poulain, Le)\n",
    "my_ratings[366]  = 5   # Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)\n",
    "my_ratings[622]  = 5   # Harry Potter and the Chamber of Secrets (2002)\n",
    "my_ratings[988]  = 3   # Eternal Sunshine of the Spotless Mind (2004)\n",
    "my_ratings[2925] = 1   # Louis Theroux: Law & Disorder (2008)\n",
    "my_ratings[2937] = 1   # Nothing to Declare (Rien à déclarer)\n",
    "my_ratings[793]  = 5   # Pirates of the Caribbean: The Curse of the Black Pearl (2003)\n",
    "\n",
    "# When the value of my_ratings[i] > 0, then return this i-th position related to a changed value (NON-ZERO)\n",
    "# in my_rated = [246, 366, 382, 622, 793, 929, 988, 1150, 2609, 2700, 2716, 2925, 2937] list\n",
    "my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]\n",
    "\n",
    "# Print changed i positions list 'my_rated'\n",
    "print(\"Changed values IDs -> i =\",my_rated)\n",
    "\n",
    "print('\\nNew user ratings for MOVIES:\\n')\n",
    "for i in range(len(my_ratings)):  # Iterate i=0,1,...,4777 \n",
    "    if my_ratings[i] > 0 :        # When the value of my_ratings[i] > 0 was changed to (NON-ZERO)\n",
    "        \n",
    "        # Print the rating=my_ratings[i] and the MOVIE 'name', localized with its ID position i\n",
    "        # movieList_df.loc[i=246,\"title\"] -> Select the MOVIE with ID=i=246,  \n",
    "        # and just select the \"title\"=\"Shrek (2001)\"column value\n",
    "        print(f'ID:{i}, New Rating:{my_ratings[i]}, Name:{movieList_df.loc[i,\"title\"]}');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's add these reviews to $Y$ and $R$ and normalize the ratings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "deletable": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Y= [[0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " ...\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]] (4778, 443)\n",
      "\n",
      "my_ratings= [0. 0. 0. ... 0. 0. 0.] (4778,)\n",
      "\n",
      "new Y= [[0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " ...\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]] (4778, 444)\n",
      "\n",
      "(my_ratings != 0)= [False False False ... False False False] (4778,)\n",
      "\n",
      "(my_ratings != 0).astype(int) = [0 0 0 ... 0 0 0] (4778,)\n",
      "\n",
      "R= [[0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " ...\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]] (4778, 443)\n",
      "\n",
      "new R= [[0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " ...\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]] (4778, 444)\n",
      "\n",
      "Ymean= [[3.4 ]\n",
      " [3.25]\n",
      " [2.  ]\n",
      " ...\n",
      " [3.5 ]\n",
      " [3.5 ]\n",
      " [3.5 ]] (4778, 1)\n",
      "\n",
      "Ynorm= [[0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " ...\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]\n",
      " [0. 0. 0. ... 0. 0. 0.]] (4778, 444)\n"
     ]
    }
   ],
   "source": [
    "# Reload ratings again\n",
    "Y, R = load_ratings_small()\n",
    "\n",
    "# Y(4778 rows/movies,443 cols/users)\n",
    "print(\"Y=\",Y,Y.shape)\n",
    "\n",
    "# my_ratings(4778 elems,)\n",
    "print(\"\\nmy_ratings=\",my_ratings,my_ratings.shape) \n",
    "\n",
    "# Concatenate arrays along the second axis (columns). \n",
    "# The new ratings array is added as a new column array, \n",
    "# at the beginning of Y (New User j rates all movies/rows).\n",
    "# Add NEW USER j ratings array, at the beginning of Y\n",
    "\n",
    "#       my_ratings user col/feature \n",
    "#                    |\n",
    "# my_ratings        \\/          Y\n",
    "# [0,               [[0, 0, 0, ..., 0, 0, 0],\n",
    "#  0,         add    [0, 0, 0, ..., 0, 0, 0],       \n",
    "#  ...        col               ...\n",
    "#  0]                [0, 0, 0, ..., 0, 0, 0]] (4778,443)\n",
    "\n",
    "#            new Y\n",
    "#  [[0, 0, 0, ..., 0, 0, 0],\n",
    "#   [0, 0, 0, ..., 0, 0, 0],       \n",
    "#             ...\n",
    "#   [0, 0, 0, ..., 0, 0, 0]] (4778,444)\n",
    "Y = np.c_[my_ratings, Y]\n",
    "\n",
    "# new Y(4778 rows/movies,444 cols/users)\n",
    "print(\"\\nnew Y=\",Y,Y.shape)              \n",
    "\n",
    "# (my_ratings != 0) = [False False False ... True True False]\n",
    "print(\"\\n(my_ratings != 0)=\",(my_ratings != 0), (my_ratings != 0).shape)\n",
    "\n",
    "# (my_ratings != 0).astype(int) = [0 0 0 ... 1 1 0]\n",
    "print(\"\\n(my_ratings != 0).astype(int) =\",(my_ratings != 0).astype(int), (my_ratings != 0).astype(int).shape)\n",
    "\n",
    "# R(4778 rows/movies,443 cols/users) \n",
    "print(\"\\nR=\",R,R.shape)\n",
    "\n",
    "# ALL my_ratings != 0 -> 'True' others are 'False'\n",
    "# (my_ratings != 0) = [False False False ... True True False]\n",
    "# (my_ratings != 0).astype(int) = [0 0 0 ... 1 1 0]\n",
    "# Add NEW USER j boolean indicator array,at the beginning of R\n",
    "\n",
    "#       my_ratings=1 others 0, user col/feature \n",
    "#                    |\n",
    "# my_ratings        \\/          R\n",
    "# [0,               [[0, 0, 0, ..., 0, 0, 0],\n",
    "#  0,         add    [0, 0, 0, ..., 0, 0, 0],       \n",
    "#  ...        col              ...\n",
    "#  0]                [0, 0, 0, ..., 0, 0, 0]] (4778,443)\n",
    "\n",
    "#            new R\n",
    "#  [[0, 0, 0, ..., 0, 0, 0],\n",
    "#   [0, 0, 0, ..., 0, 0, 0],       \n",
    "#             ...\n",
    "#   [0, 0, 0, ..., 0, 0, 0]] (4778,444)\n",
    "R = np.c_[(my_ratings != 0).astype(int), R]\n",
    "\n",
    "# new R(4778 rows/movies,444 cols/users) \n",
    "print(\"\\nnew R=\",R,R.shape)\n",
    "\n",
    "# Normalize the Dataset using 'normalizeRatings()' helper function \n",
    "Ynorm, Ymean = normalizeRatings(Y, R)\n",
    "\n",
    "# Ymean(4778 rows/movies,1 col/mean)         FIT ALL 4778 elems, 1 col\n",
    "# Ymean = (np.sum(Y*R,axis=1)/(np.sum(R, axis=1)+1e-12)).reshape(-1,1)\n",
    "# Y*R = np.multiply(Y, R) elementwise\n",
    "print(\"\\nYmean=\",Ymean, Ymean.shape)\n",
    "\n",
    "# Ynorm(4778 rows/movies,444 cols/users)\n",
    "# Ynorm = Y - np.multiply(Ymean, R) elementwise, each Ymean [row] by each [new R] row. \n",
    "# Ymean*R = np.multiply(Ymean, R) elementwise\n",
    "\n",
    "#   Ymean                             new R\n",
    "# [[3.4 ], ->          *    [[0. 0. 0. ... 0. 0. 0.],\n",
    "#  [3.25], ->                [0. 0. 0. ... 0. 0. 0.],\n",
    "#  [2.  ], ->                [0. 0. 0. ... 0. 0. 0.],\n",
    "#   ...                                ...\n",
    "#  [3.5 ], ->                [0. 0. 0. ... 0. 0. 0.],\n",
    "#  [3.5 ], ->                [0. 0. 0. ... 0. 0. 0.],\n",
    "#  [3.5 ]] (4778, 1) ->      [0. 0. 0. ... 0. 0. 0.]] (4778, 444)\n",
    "\n",
    "#            new Y                               Ymean          =            Ynorm \n",
    "# [[0, 0, 0, ..., 0, 0, 0],         -    <- [[3.4 ],                [[0, 0, 0, ..., 0, 0, 0], \n",
    "# [0, 0, 0, ..., 0, 0, 0],               <- [3.25],                  [0, 0, 0, ..., 0, 0, 0], \n",
    "#              ...                            ...                              ... \n",
    "# [0, 0, 0, ..., 0, 0, 0]] (4778,444)    <- [3.5 ]] (4778, 1)        [0, 0, 0, ..., 0, 0, 0]] (4778,444)\n",
    "print(\"\\nYnorm=\",Ynorm, Ynorm.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's prepare to train the model. Initialize the parameters and select the Adam optimizer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "deletable": false,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Useful Values\n",
    "# (nm=4778, nu=444)   \n",
    "num_movies, num_users = Y.shape\n",
    "\n",
    "# n = 100 features\n",
    "num_features = 100\n",
    "\n",
    "# tf.random.set_seed() ensures that you get the same consistent results,  \n",
    "# every time you run the code with the same seed and parameters.\n",
    "tf.random.set_seed(1234)\n",
    "\n",
    "# Set Initial Parameters (W,b,X), using 'tf.Variable' to track/modify/update these variables.\n",
    "# 'tf.Variable()' is used to create variables. These variables are tensors whose values \n",
    "# can be modified during execution, unlike constants. Variables play a crucial role in machine learning, \n",
    "# allowing models to learn and adapt to data. They are essential for model training, \n",
    "# as their values are updated during the optimization process.\n",
    "\n",
    "# 'tf.random.normal()' generates random values that follow a normal (Gaussian) distribution. \n",
    "# It produces a tensor with the specified shape and data type, \n",
    "# filling the tensor with numbers generated from this distribution.\n",
    "\n",
    "# W (nu=444 users/rows, n=100 features/cols) 2D array initialized with 'float64' dtype random values, \n",
    "# that follow a normal (Gaussian) distribution.\n",
    "W = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')\n",
    "\n",
    "# X (nm=4778 movies/rows, n=100 features/cols) 2D array initialized with 'float64' dtype random values, \n",
    "# that follow a normal (Gaussian) distribution.\n",
    "X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')\n",
    "\n",
    "# b (1 row, nu=444 users/elems) 2D array initialized with 'float64' dtype random values, \n",
    "# that follow a normal (Gaussian) distribution.\n",
    "b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')\n",
    "\n",
    "# 'Learning rate' determines the size of the step by which the model parameters will change during each iteration. \n",
    "# A large learning rate can lead to unstable model training, while a too small value can slow down the learning process. \n",
    "# Typically, an INITIAL learning rate value is chosen, but 'Adam' automatically adapts it over time.\n",
    "# Instantiate the 'Adam' optimizer, with initial learning rate = 0.1\n",
    "optimizer = keras.optimizers.Adam(learning_rate=1e-1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now train the collaborative filtering model. This will learn the parameters $\\mathbf{X}$, $\\mathbf{W}$, and $\\mathbf{b}$. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The operations involved in learning $w$, $b$, and $x$ simultaneously do not fall into the typical 'layers' offered in the TensorFlow neural network package.  Consequently, the flow used in Course 2: Model, Compile(), Fit(), Predict(), are not directly applicable. Instead, we can use a custom training loop.\n",
    "\n",
    "Recall from earlier labs the steps of gradient descent.\n",
    "- repeat until convergence:\n",
    "    - compute forward pass\n",
    "    - compute the derivatives of the loss relative to parameters\n",
    "    - update the parameters using the learning rate and the computed derivatives \n",
    "    \n",
    "TensorFlow has the marvelous capability of calculating the derivatives for you. This is shown below. Within the `tf.GradientTape()` section, operations on Tensorflow Variables are tracked. When `tape.gradient()` is later called, it will return the gradient of the loss relative to the tracked variables. The gradients can then be applied to the parameters using an optimizer. \n",
    "This is a very brief introduction to a useful feature of TensorFlow and other machine learning frameworks. Further information can be found by investigating \"custom training loops\" within the framework of interest.\n",
    "    \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "deletable": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training loss at iteration 0: 2321191.3\n",
      "Training loss at iteration 20: 136168.7\n",
      "Training loss at iteration 40: 51863.3\n",
      "Training loss at iteration 60: 24598.8\n",
      "Training loss at iteration 80: 13630.4\n",
      "Training loss at iteration 100: 8487.6\n",
      "Training loss at iteration 120: 5807.7\n",
      "Training loss at iteration 140: 4311.6\n",
      "Training loss at iteration 160: 3435.2\n",
      "Training loss at iteration 180: 2902.1\n"
     ]
    }
   ],
   "source": [
    "# Adam Optimizer, total loop iterations (GD updates)\n",
    "iterations = 200\n",
    "\n",
    "# REGULARIZATION parameter\n",
    "lambda_ = 1\n",
    "\n",
    "# Iterate through i = [0, 1, 2, ..., 199] or GD updates\n",
    "for iter in range(iterations):\n",
    "    \n",
    "    # Use TensorFlow’s GradientTape\n",
    "    # to record the operations used to compute the cost\n",
    "    # Record steps needed to compute COST J, enabling auto-differentiation\n",
    "    with tf.GradientTape() as tape:\n",
    "\n",
    "        # Compute the COST function J with REGULARIZATION (forward pass included in cost)\n",
    "        cost_value = cofi_cost_func_v(X, W, b, Ynorm, R, lambda_)\n",
    "\n",
    "    # Use the TF Gradient Tape to automatically retrieve a list grads = [dJ/dX, dJ/dW, dJ/db] with\n",
    "    # the gradients / partial derivatives of the trainable variables X,W,b with respect to the COST J\n",
    "    grads = tape.gradient( cost_value, [X,W,b] )\n",
    "\n",
    "    # grads = [dJ/dX, dJ/dW, dJ/db] parameters = [X, W, b] -> zip(grads,parameters) = \n",
    "    # = [(dJ/dX, X), (dJ/dW, W), (dJ/db, b)]. Combines each computed gradient/derivative in 'grads' \n",
    "    # with its 'parameter'in a tuple (_,_), and put ALL tuples in a list [(_,_), (_,_), (_,_)],  \n",
    "    # to get ready, to be UPDATED and MINIMIZED each tuple at a time, via GD, by the picked 'optimizer'. \n",
    "    # Run one (1) step / update of gradient descent (GD) by UPDATING / MINIMIZING\n",
    "    # the value of the TF variables / changeable parameters X,W,b to also MINIMIZE the COST function J.\n",
    "    optimizer.apply_gradients( zip(grads, [X,W,b]) )\n",
    "\n",
    "    # Log periodically. \n",
    "    # Module/residue of (iter / 20) == 0 ->  iter % 20 == 0 only for step = 20\n",
    "    # When iter= 0,20,40,60,80,...,200 then module of (iter % 20 == 0), so print info. \n",
    "    # Otherwise, module of (iter % 20 != 0) doesn't print info.  \n",
    "    if iter % 20 == 0:\n",
    "        \n",
    "        # Print COST J (scalar) with REGULARIZATION, as x.x float value, when (iter % 20 == 0).\n",
    "        print(f\"Training loss at iteration {iter}: {cost_value:0.1f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "SSzUL7eQodYS"
   },
   "source": [
    "<a name=\"6\"></a>\n",
    "## 6 - Recommendations\n",
    "Below, we compute the ratings for all the movies and users and display the movies that are recommended. These are based on the movies and ratings entered as `my_ratings[]` above. To predict the rating of movie $i$ for user $j$, you compute $\\mathbf{w}^{(j)} \\cdot \\mathbf{x}^{(i)} + b^{(j)}$. This can be computed for all ratings using matrix multiplication."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "deletable": false,
    "id": "ns266wKtodYT"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "New User Predictions= [2.76811272 2.74113299 1.32460694 ... 2.92504886 2.92511201 2.91981803] (4778,)\n",
      "\n",
      "List of higher predictions indices = tf.Tensor([1150  246  929 ... 3680 2644 1209], shape=(4778,), dtype=int32) \n",
      "List of higher predicted ratings [ 4.89889218  4.89713717  4.88705315 ... -0.12205908 -0.12212146\n",
      " -0.12213415] \n",
      "\n",
      "0 index= tf.Tensor(1150, shape=(), dtype=int32) movie NEW RATED: Incredibles, The (2004)\n",
      "1 index= tf.Tensor(246, shape=(), dtype=int32) movie NEW RATED: Shrek (2001)\n",
      "2 index= tf.Tensor(929, shape=(), dtype=int32) movie NEW RATED: Lord of the Rings: The Return of the King, The (2003)\n",
      "3 index= tf.Tensor(622, shape=(), dtype=int32) movie NEW RATED: Harry Potter and the Chamber of Secrets (2002)\n",
      "4 index= tf.Tensor(793, shape=(), dtype=int32) movie NEW RATED: Pirates of the Caribbean: The Curse of the Black Pearl (2003)\n",
      "5 index= tf.Tensor(366, shape=(), dtype=int32) movie NEW RATED: Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)\n",
      "6 index= tf.Tensor(2700, shape=(), dtype=int32) movie NEW RATED: Toy Story 3 (2010)\n",
      "Predicted rating 4.49 for movie NOT NEW RATED: My Sassy Girl (Yeopgijeogin geunyeo) (2001)\n",
      "Predicted rating 4.48 for movie NOT NEW RATED: Martin Lawrence Live: Runteldat (2002)\n",
      "Predicted rating 4.48 for movie NOT NEW RATED: Memento (2000)\n",
      "Predicted rating 4.47 for movie NOT NEW RATED: Delirium (2014)\n",
      "Predicted rating 4.47 for movie NOT NEW RATED: Laggies (2014)\n",
      "Predicted rating 4.47 for movie NOT NEW RATED: One I Love, The (2014)\n",
      "Predicted rating 4.46 for movie NOT NEW RATED: Particle Fever (2013)\n",
      "Predicted rating 4.45 for movie NOT NEW RATED: Eichmann (2007)\n",
      "Predicted rating 4.45 for movie NOT NEW RATED: Battle Royale 2: Requiem (Batoru rowaiaru II: Chinkonka) (2003)\n",
      "Predicted rating 4.45 for movie NOT NEW RATED: Into the Abyss (2011)\n",
      "\n",
      "\n",
      "Original vs Predicted ratings (>0):\n",
      "\n",
      "Original 5.0, Predicted 4.90 for Shrek (2001)\n",
      "Original 5.0, Predicted 4.84 for Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)\n",
      "Original 2.0, Predicted 2.13 for Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)\n",
      "Original 5.0, Predicted 4.88 for Harry Potter and the Chamber of Secrets (2002)\n",
      "Original 5.0, Predicted 4.87 for Pirates of the Caribbean: The Curse of the Black Pearl (2003)\n",
      "Original 5.0, Predicted 4.89 for Lord of the Rings: The Return of the King, The (2003)\n",
      "Original 3.0, Predicted 3.00 for Eternal Sunshine of the Spotless Mind (2004)\n",
      "Original 5.0, Predicted 4.90 for Incredibles, The (2004)\n",
      "Original 2.0, Predicted 2.11 for Persuasion (2007)\n",
      "Original 5.0, Predicted 4.80 for Toy Story 3 (2010)\n",
      "Original 3.0, Predicted 3.00 for Inception (2010)\n",
      "Original 1.0, Predicted 1.41 for Louis Theroux: Law & Disorder (2008)\n",
      "Original 1.0, Predicted 1.26 for Nothing to Declare (Rien à déclarer) (2010)\n"
     ]
    }
   ],
   "source": [
    "# tensor 'variable.numpy()' -> Converts a tensor into a numpy matrix,\n",
    "# to be operated using 'np.matmult()' or standard 2D matrix multiplication\n",
    "# X(4778 movies, 100 features) -> [4778 x 100]\n",
    "# W(444 users, 100 features)   -> [444 x 100]\n",
    "# W^T(100 features, 444 users) -> [100 x 400]\n",
    "# b(1 row, 444 users/elems)    -> [1 x 444]\n",
    "# Make a prediction using trained weights W and biases b.\n",
    "# pred[4778 x 444] = X[4778 x 100] * W[100 x 444] + b[1 x 444] = X*W[4778 x 444] + b[1 x 444]\n",
    "p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()\n",
    "\n",
    "# Previously, we subtracted (Y - Ymean) to avoid that a NEW USER j who has not rated a movie,\n",
    "# have y = 0 stars, because we initialize all parameters W, b as '0'.\n",
    "# Now, we'll give back or ADD this (Y^prediction + Ymean).\n",
    "#           pred Y^                             Ymean            =        pred Y^mean\n",
    "#  [[0, 0, 0, ..., 0, 0, 0],            +   <-[[3.4 ],             [[0, 0, 0, ..., 0, 0, 0],\n",
    "#   [0, 0, 0, ..., 0, 0, 0],                <- [3.25],              [0, 0, 0, ..., 0, 0, 0],\n",
    "#             ...                               ...                           ...\n",
    "#   [0, 0, 0, ..., 0, 0, 0]] (4778,444)     <- [3.5 ]] (4778, 1)    [0, 0, 0, ..., 0, 0, 0]] (4778,444)\n",
    "pm = p + Ymean\n",
    "\n",
    "# Select ALL MOVIES / rows predictions, for just the USER j=0 col\n",
    "# my_predictions = [0, 0, 0, ... , 0] (4778,)\n",
    "my_predictions = pm[:,0]\n",
    "\n",
    "# NEW USER predictions of ratings in a list (one per movie) \n",
    "# my_predictions = [2.76811272 2.74113299 1.32460694 ... 2.92504886 2.92511201 2.91981803] (4778,) 1D\n",
    "print(\"New User Predictions=\",my_predictions, my_predictions.shape)\n",
    "\n",
    "# 'tf.argsort(tensor, direction='DESCENDING')' returns the indices of rating predictions\n",
    "# sorted in a 'DESCENDING' order -> Tensor[index of MAX rating pred -> index of MIN rating pred]\n",
    "#  Sort predictions in DESCENDING order. \n",
    "ix = tf.argsort(my_predictions, direction='DESCENDING')\n",
    "\n",
    "# Print list of indices in DESCENDING order for NEW USER j=0 predictions -> 'my_predictions'\n",
    "# Print NEW USER j=0 predictions in DESCENDING order -> 'my_predictions[ix]' \n",
    "print(\"\\nList of higher predictions indices =\",ix,\"\\nList of higher predicted ratings\", my_predictions[ix],\"\\n\")\n",
    "\n",
    "# Select just first 17 HIGHER rating indices out of ALL predicted movies, at 'ix', \n",
    "# so we'll iterate only through these ones -> i = 0,1,2,...,16\n",
    "for i in range(17):  \n",
    "    \n",
    "    # If New User Ratings of predictions, were sorted in DESCENDING order,\n",
    "    # then the 1st 17 indices at 'ix' correpond to the HIGHER rating values,\n",
    "    # so we'll pick each of the HIGHER 17 indices at j.\n",
    "        \n",
    "    # j = [1150, 246, 929, 622, 793, 366, 2700, 1201, 549, 211, 3924, 3754, 3742, 3625, 2456, 1225, 3062] \n",
    "    j = ix[i]        \n",
    "    \n",
    "    # my_rated = [246, 366, 382, 622, 793, 929, 988, 1150, 2609, 2700, 2716, 2925, 2937] \n",
    "    # New User list of rated movies indices (We Changed the rating values)\n",
    "    # If one of first 17 HIGHER movie predicted rating index 'j', is in the list 'my_rated'\n",
    "    # of NEW RATINGS done by new user, display that index.\n",
    "    if j in my_rated:\n",
    "        print(i,\"index=\",j,\"movie NEW RATED:\",movieList[j])\n",
    "\n",
    "    # If one of the first 17 HIGHER movie rating indices 'j' is NOT in the list 'my_rated'\n",
    "    # this movie is NOT NEW RATED by new user, so it keeps the original rating, so display that.\n",
    "    if j not in my_rated: \n",
    "        print(f'Predicted rating {my_predictions[j]:0.2f} for movie NOT NEW RATED: {movieList[j]}')\n",
    "\n",
    "print('\\n\\nOriginal vs Predicted ratings (>0):\\n')\n",
    "\n",
    "# my_ratings= [0. 0. 0. ... 0. 0. 0.] (4778,) one rating per movie\n",
    "# Iterate through each movie/row ID i=[0,1,2,...,4777]\n",
    "for i in range(len(my_ratings)):\n",
    "    \n",
    "    # If there's a movie with a rating > 0, \n",
    "    # Print: original rating (>0) from 'my_ratings', predicted rating (>0) from 'my_predictions'\n",
    "    # and movie name from 'movieList'\n",
    "    # my_ratings= [0. 0. 0. ... 0. 0. 0.] (4778,) one rating per movie\n",
    "    if my_ratings[i] > 0:\n",
    "        print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movieList[i]}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In practice, additional information can be utilized to enhance our predictions. Above, the predicted ratings for the first few hundred movies lie in a small range. We can augment the above by selecting from those top movies, movies that have high average ratings and movies with more than 20 ratings. This section uses a [Pandas](https://pandas.pydata.org/) data frame which has many handy sorting features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "deletable": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pred</th>\n",
       "      <th>mean rating</th>\n",
       "      <th>number of ratings</th>\n",
       "      <th>title</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1743</th>\n",
       "      <td>4.030965</td>\n",
       "      <td>4.252336</td>\n",
       "      <td>107</td>\n",
       "      <td>Departed, The (2006)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2112</th>\n",
       "      <td>3.985287</td>\n",
       "      <td>4.238255</td>\n",
       "      <td>149</td>\n",
       "      <td>Dark Knight, The (2008)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>211</th>\n",
       "      <td>4.477792</td>\n",
       "      <td>4.122642</td>\n",
       "      <td>159</td>\n",
       "      <td>Memento (2000)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>929</th>\n",
       "      <td>4.887053</td>\n",
       "      <td>4.118919</td>\n",
       "      <td>185</td>\n",
       "      <td>Lord of the Rings: The Return of the King, The...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2700</th>\n",
       "      <td>4.796530</td>\n",
       "      <td>4.109091</td>\n",
       "      <td>55</td>\n",
       "      <td>Toy Story 3 (2010)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>653</th>\n",
       "      <td>4.357304</td>\n",
       "      <td>4.021277</td>\n",
       "      <td>188</td>\n",
       "      <td>Lord of the Rings: The Two Towers, The (2002)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1122</th>\n",
       "      <td>4.004469</td>\n",
       "      <td>4.006494</td>\n",
       "      <td>77</td>\n",
       "      <td>Shaun of the Dead (2004)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1841</th>\n",
       "      <td>3.980647</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>61</td>\n",
       "      <td>Hot Fuzz (2007)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3083</th>\n",
       "      <td>4.084633</td>\n",
       "      <td>3.993421</td>\n",
       "      <td>76</td>\n",
       "      <td>Dark Knight Rises, The (2012)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2804</th>\n",
       "      <td>4.434171</td>\n",
       "      <td>3.989362</td>\n",
       "      <td>47</td>\n",
       "      <td>Harry Potter and the Deathly Hallows: Part 1 (...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>773</th>\n",
       "      <td>4.289679</td>\n",
       "      <td>3.960993</td>\n",
       "      <td>141</td>\n",
       "      <td>Finding Nemo (2003)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1771</th>\n",
       "      <td>4.344993</td>\n",
       "      <td>3.944444</td>\n",
       "      <td>81</td>\n",
       "      <td>Casino Royale (2006)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2649</th>\n",
       "      <td>4.133482</td>\n",
       "      <td>3.943396</td>\n",
       "      <td>53</td>\n",
       "      <td>How to Train Your Dragon (2010)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2455</th>\n",
       "      <td>4.175746</td>\n",
       "      <td>3.887931</td>\n",
       "      <td>58</td>\n",
       "      <td>Harry Potter and the Half-Blood Prince (2009)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>361</th>\n",
       "      <td>4.135291</td>\n",
       "      <td>3.871212</td>\n",
       "      <td>132</td>\n",
       "      <td>Monsters, Inc. (2001)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3014</th>\n",
       "      <td>3.967901</td>\n",
       "      <td>3.869565</td>\n",
       "      <td>69</td>\n",
       "      <td>Avengers, The (2012)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>246</th>\n",
       "      <td>4.897137</td>\n",
       "      <td>3.867647</td>\n",
       "      <td>170</td>\n",
       "      <td>Shrek (2001)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>151</th>\n",
       "      <td>3.971888</td>\n",
       "      <td>3.836364</td>\n",
       "      <td>110</td>\n",
       "      <td>Crouching Tiger, Hidden Dragon (Wo hu cang lon...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1150</th>\n",
       "      <td>4.898892</td>\n",
       "      <td>3.836000</td>\n",
       "      <td>125</td>\n",
       "      <td>Incredibles, The (2004)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>793</th>\n",
       "      <td>4.874935</td>\n",
       "      <td>3.778523</td>\n",
       "      <td>149</td>\n",
       "      <td>Pirates of the Caribbean: The Curse of the Bla...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>366</th>\n",
       "      <td>4.843375</td>\n",
       "      <td>3.761682</td>\n",
       "      <td>107</td>\n",
       "      <td>Harry Potter and the Sorcerer's Stone (a.k.a. ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>754</th>\n",
       "      <td>4.021774</td>\n",
       "      <td>3.723684</td>\n",
       "      <td>76</td>\n",
       "      <td>X2: X-Men United (2003)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>4.242984</td>\n",
       "      <td>3.699248</td>\n",
       "      <td>133</td>\n",
       "      <td>X-Men (2000)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>622</th>\n",
       "      <td>4.878342</td>\n",
       "      <td>3.598039</td>\n",
       "      <td>102</td>\n",
       "      <td>Harry Potter and the Chamber of Secrets (2002)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          pred  mean rating  number of ratings  \\\n",
       "1743  4.030965     4.252336                107   \n",
       "2112  3.985287     4.238255                149   \n",
       "211   4.477792     4.122642                159   \n",
       "929   4.887053     4.118919                185   \n",
       "2700  4.796530     4.109091                 55   \n",
       "653   4.357304     4.021277                188   \n",
       "1122  4.004469     4.006494                 77   \n",
       "1841  3.980647     4.000000                 61   \n",
       "3083  4.084633     3.993421                 76   \n",
       "2804  4.434171     3.989362                 47   \n",
       "773   4.289679     3.960993                141   \n",
       "1771  4.344993     3.944444                 81   \n",
       "2649  4.133482     3.943396                 53   \n",
       "2455  4.175746     3.887931                 58   \n",
       "361   4.135291     3.871212                132   \n",
       "3014  3.967901     3.869565                 69   \n",
       "246   4.897137     3.867647                170   \n",
       "151   3.971888     3.836364                110   \n",
       "1150  4.898892     3.836000                125   \n",
       "793   4.874935     3.778523                149   \n",
       "366   4.843375     3.761682                107   \n",
       "754   4.021774     3.723684                 76   \n",
       "79    4.242984     3.699248                133   \n",
       "622   4.878342     3.598039                102   \n",
       "\n",
       "                                                  title  \n",
       "1743                               Departed, The (2006)  \n",
       "2112                            Dark Knight, The (2008)  \n",
       "211                                      Memento (2000)  \n",
       "929   Lord of the Rings: The Return of the King, The...  \n",
       "2700                                 Toy Story 3 (2010)  \n",
       "653       Lord of the Rings: The Two Towers, The (2002)  \n",
       "1122                           Shaun of the Dead (2004)  \n",
       "1841                                    Hot Fuzz (2007)  \n",
       "3083                      Dark Knight Rises, The (2012)  \n",
       "2804  Harry Potter and the Deathly Hallows: Part 1 (...  \n",
       "773                                 Finding Nemo (2003)  \n",
       "1771                               Casino Royale (2006)  \n",
       "2649                    How to Train Your Dragon (2010)  \n",
       "2455      Harry Potter and the Half-Blood Prince (2009)  \n",
       "361                               Monsters, Inc. (2001)  \n",
       "3014                               Avengers, The (2012)  \n",
       "246                                        Shrek (2001)  \n",
       "151   Crouching Tiger, Hidden Dragon (Wo hu cang lon...  \n",
       "1150                            Incredibles, The (2004)  \n",
       "793   Pirates of the Caribbean: The Curse of the Bla...  \n",
       "366   Harry Potter and the Sorcerer's Stone (a.k.a. ...  \n",
       "754                             X2: X-Men United (2003)  \n",
       "79                                         X-Men (2000)  \n",
       "622      Harry Potter and the Chamber of Secrets (2002)  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get ALL the ID movies with \"number of ratings\" > 20, returning a pandas list of 'booleans'\n",
    "# filters=[False, False, False, False, True, ..., False, False, False, False, False]\n",
    "filter=(movieList_df[\"number of ratings\"] > 20)\n",
    "\n",
    "# Assign to column \"pred\", ALL the (4778,) elements in 'my_predictions' list\n",
    "# extracted from ALL predictions pm[ : = all rows, 0 = col 0] -> just NEW USER array\n",
    "movieList_df[\"pred\"] = my_predictions\n",
    "\n",
    "# Rename the columns of df 'movieList_df', using 'df.reindex(columns list)' function,\n",
    "# with list of columns = [\"pred\", \"mean rating\", \"number of ratings\", \"title\"]\n",
    "# Then overwrite df.\n",
    "movieList_df = movieList_df.reindex(columns=[\"pred\", \"mean rating\", \"number of ratings\", \"title\"])\n",
    "\n",
    "# From 'movieList_df' Select the first 300 movies indices at 'ix' DESCENDING list\n",
    "# Then select just rows/IDs which \"number of ratings\" > 20.\n",
    "# Finally, sort rows/IDs by column \"mean rating\" in a DESCENDING way [MAX > MIN] values.\n",
    "movieList_df.loc[ix[:300]].loc[filter].sort_values(\"mean rating\", ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"7\"></a>\n",
    "## 7 - Congratulations! <img align=\"left\" src=\"./images/film_award.png\"     style=\" width:40px;  \" >\n",
    "You have implemented a useful recommender system!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "  <summary><font size=\"2\" color=\"darkgreen\"><b>Please click here if you want to experiment with any of the non-graded code.</b></font></summary>\n",
    "    <p><i><b>Important Note: Please only do this when you've already passed the assignment to avoid problems with the autograder.</b></i>\n",
    "    <ol>\n",
    "        <li> On the notebook’s menu, click “View” > “Cell Toolbar” > “Edit Metadata”</li>\n",
    "        <li> Hit the “Edit Metadata” button next to the code cell which you want to lock/unlock</li>\n",
    "        <li> Set the attribute value for “editable” to:\n",
    "            <ul>\n",
    "                <li> “true” if you want to unlock it </li>\n",
    "                <li> “false” if you want to lock it </li>\n",
    "            </ul>\n",
    "        </li>\n",
    "        <li> On the notebook’s menu, click “View” > “Cell Toolbar” > “None” </li>\n",
    "    </ol>\n",
    "    <p> Here's a short demo of how to do the steps above: \n",
    "        <br>\n",
    "        <img src=\"https://lh3.google.com/u/0/d/14Xy_Mb17CZVgzVAgq7NCjMVBvSae3xO1\" align=\"center\" alt=\"unlock_cells.gif\">\n",
    "</details>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }