Skip to content

Instantly share code, notes, and snippets.

@JamesSaxon
Created December 15, 2020 18:49
Show Gist options
  • Save JamesSaxon/ec3ef644769dc67485a9a382590adb5f to your computer and use it in GitHub Desktop.
Save JamesSaxon/ec3ef644769dc67485a9a382590adb5f to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from scipy.stats import linregress"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Lets define $y = x^2$ on (-10, 10) with a ton of statistics and some noise."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"N = 100000000\n",
"\n",
"xmin, xmax = -10, 10\n",
"noise = 1\n",
"\n",
"x = np.random.uniform(xmin, xmax, N)\n",
"e = np.random.normal(scale = noise, size = N)\n",
"b = 5\n",
"\n",
"y = b * x**2 + e"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### So we have an even function, symmetric around $x = 0$, and the slope should be 0."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.0022095864543294793"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"linregress(x, y).slope"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### But the slope is only _actually_ zero at 0! If we restrict the _domain,_ we should be able to \"take the derivative.\" This is just $2bx$, and the simple procedure gives the correct answer."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(50, 49.95937635131443)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x0, eps = 5, 0.05\n",
"\n",
"xcut = (x0 - eps < x) & (x < x0 + eps)\n",
"2 * b * x0, linregress(x[xcut], y[xcut]).slope"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Note that cutting the range instead of the domain does not yield correct estimates, even if we manage to focus on the right area...."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For example, the estimate below is zero because of symmetry."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-2.5271768586034784e-05"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y0 = b * x0**2\n",
"\n",
"ycut = (y0 - eps < y) & (y < y0 + eps)\n",
"linregress(x[ycut], y[ycut]).slope"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But even if we restrict to $x > 0$, the estimate is attenuated."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.062255465871224006"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y0 = b * x0**2\n",
"\n",
"xpos_ycut = (x > 0) & (y0 - eps < y) & (y < y0 + eps)\n",
"linregress(x[xpos_ycut], y[xpos_ycut]).slope"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### That bias stems from truncating outcomes in a biased way -- towards the center of the range. If you remove the noise, that doesn't happen.\n",
"\n",
"However, you _would_ still need to focus on the right range, which is non-trivial, since you don't know the outcome a priori!"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"49.9999930655446"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y2 = b * x**2\n",
"\n",
"xpos_ycut = (x > 0) & (y0 - eps < y2) & (y2 < y0 + eps)\n",
"linregress(x[xpos_ycut], y2[xpos_ycut]).slope"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment