{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Example of Gradient Descent optimization for simple linear regression (example adopted from \"The Hundred Page Machine Learning Book\", by Andriy Burkov)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "import matplotlib\n", "matplotlib.rcParams['mathtext.fontset'] = 'stix'\n", "matplotlib.rcParams['font.family'] = 'STIXGeneral'\n", "matplotlib.rcParams.update({'font.size': 18})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall linear regression model: $f(x) = wx + b$. We find the values of parameters $w$ and $b$ by minimizing the mean squared error:\n", "\n", "\\begin{equation*}\n", "l = 1/N \\sum_{i=1}^N (y_i - (wx +b))^2\n", "\\end{equation*}\n", "\n", "For Gradient Descent, we first need to obtain the partial derivatives of this loss function relative each parameter:\n", "\n", "\\begin{equation*}\n", "\\frac{\\partial l}{\\partial w} = 1/N \\sum_{i=1}^N -2x_i(y_i - (wx +b)\n", "\\end{equation*}\n", "\n", "\\begin{equation*}\n", "\\frac{\\partial l}{\\partial b} = 1/N \\sum_{i=1}^N -2(y_i - (wx +b)\n", "\\end{equation*}\n", "\n", "Gradient Descent proceeds in epochs. In each epoch, the entire training set is used to update each parameter. We initialize parameters $w$ and $b$ to 0. In each iteration (epoch), we compute the partial derivatives using the current values of $w$ and $b$. Then, we simultaneously update the parameters $w$ and $b$ using these partial derivatives:\n", "\n", "\\begin{equation*}\n", "w = w - \\alpha \\frac{\\partial l}{\\partial w}\n", "\\end{equation*}\n", "\n", "\\begin{equation*}\n", "b = b - \\alpha \\frac{\\partial l}{\\partial b}\n", "\\end{equation*}\n", "\n", "The $\\alpha$ is the learning rate and controls the size of the update. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example: Let's look at a specific example of simple linear regression using a real data set about ad spending. Here, in $y = wx + b$, $y$ = Sales (unit product sales) and $x$ = Spending (in millions of dollars). We want to find the optimal values of $w$ and $b$ using Gradient Descent." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/html": [ "
\n", " | Spending | \n", "Sales | \n", "
---|---|---|
0 | \n", "37.8 | \n", "22.1 | \n", "
1 | \n", "39.3 | \n", "10.4 | \n", "
2 | \n", "45.9 | \n", "9.3 | \n", "
3 | \n", "41.3 | \n", "18.5 | \n", "
4 | \n", "10.8 | \n", "12.9 | \n", "
5 | \n", "48.9 | \n", "7.2 | \n", "
6 | \n", "32.8 | \n", "11.8 | \n", "
7 | \n", "19.6 | \n", "13.2 | \n", "
8 | \n", "2.1 | \n", "4.8 | \n", "
9 | \n", "2.6 | \n", "10.6 | \n", "