Created
December 15, 2020 18:49
-
-
Save JamesSaxon/ec3ef644769dc67485a9a382590adb5f to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"from scipy.stats import linregress" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Lets define $y = x^2$ on (-10, 10) with a ton of statistics and some noise." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"N = 100000000\n", | |
"\n", | |
"xmin, xmax = -10, 10\n", | |
"noise = 1\n", | |
"\n", | |
"x = np.random.uniform(xmin, xmax, N)\n", | |
"e = np.random.normal(scale = noise, size = N)\n", | |
"b = 5\n", | |
"\n", | |
"y = b * x**2 + e" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### So we have an even function, symmetric around $x = 0$, and the slope should be 0." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.0022095864543294793" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"linregress(x, y).slope" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### But the slope is only _actually_ zero at 0! If we restrict the _domain,_ we should be able to \"take the derivative.\" This is just $2bx$, and the simple procedure gives the correct answer." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(50, 49.95937635131443)" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"x0, eps = 5, 0.05\n", | |
"\n", | |
"xcut = (x0 - eps < x) & (x < x0 + eps)\n", | |
"2 * b * x0, linregress(x[xcut], y[xcut]).slope" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Note that cutting the range instead of the domain does not yield correct estimates, even if we manage to focus on the right area...." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"For example, the estimate below is zero because of symmetry." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"-2.5271768586034784e-05" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"y0 = b * x0**2\n", | |
"\n", | |
"ycut = (y0 - eps < y) & (y < y0 + eps)\n", | |
"linregress(x[ycut], y[ycut]).slope" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"But even if we restrict to $x > 0$, the estimate is attenuated." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.062255465871224006" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"y0 = b * x0**2\n", | |
"\n", | |
"xpos_ycut = (x > 0) & (y0 - eps < y) & (y < y0 + eps)\n", | |
"linregress(x[xpos_ycut], y[xpos_ycut]).slope" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### That bias stems from truncating outcomes in a biased way -- towards the center of the range. If you remove the noise, that doesn't happen.\n", | |
"\n", | |
"However, you _would_ still need to focus on the right range, which is non-trivial, since you don't know the outcome a priori!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"49.9999930655446" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"y2 = b * x**2\n", | |
"\n", | |
"xpos_ycut = (x > 0) & (y0 - eps < y2) & (y2 < y0 + eps)\n", | |
"linregress(x[xpos_ycut], y2[xpos_ycut]).slope" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.8.3" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment