Created
December 30, 2018 09:36
-
-
Save danielfleischer/9c7cacc761d8f455ca3eac683eb3d8f9 to your computer and use it in GitHub Desktop.
14 statistical tests in python, using scipy.stats.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 40, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"a = np.random.randint(1,10,100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 41, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"b = np.random.normal(size=100)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"toc-hr-collapsed": false | |
}, | |
"source": [ | |
"# Normality Tests" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Shapiro Wilk\n", | |
"Tests for Gaussian distribution. Observationgs need to be IID." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import shapiro" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 43, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(0.9135679006576538, 6.518070676975185e-06)" | |
] | |
}, | |
"execution_count": 43, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"shapiro(a)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 44, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(0.9860052466392517, 0.374272882938385)" | |
] | |
}, | |
"execution_count": 44, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"shapiro(b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## D'Agostino's $K^2$\n", | |
"Tests for Gaussian distribution. Observationgs need to be IID." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 47, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import normaltest" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 48, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"NormaltestResult(statistic=41.39502617899559, pvalue=1.0260872144248276e-09)" | |
] | |
}, | |
"execution_count": 48, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"normaltest(a)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 49, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"NormaltestResult(statistic=3.593125002661302, pvalue=0.16586808066854905)" | |
] | |
}, | |
"execution_count": 49, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"normaltest(b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Anderson-Darling\n", | |
"Tests for Gaussian distribution. Observationgs need to be IID." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 50, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import anderson" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 53, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"AndersonResult(statistic=2.3516564884300664, critical_values=array([0.555, 0.632, 0.759, 0.885, 1.053]), significance_level=array([15. , 10. , 5. , 2.5, 1. ]))" | |
] | |
}, | |
"execution_count": 53, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"anderson(a)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 54, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"AndersonResult(statistic=0.3417980857269356, critical_values=array([0.555, 0.632, 0.759, 0.885, 1.053]), significance_level=array([15. , 10. , 5. , 2.5, 1. ]))" | |
] | |
}, | |
"execution_count": 54, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"anderson(b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"toc-hr-collapsed": false | |
}, | |
"source": [ | |
"# Correlation Tests" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Pearson Coefficient\n", | |
"Tests whether two samples have a linear relationship. Observations need to be IID, in each sample to be normal, in each sample to have same variance. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 55, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import pearsonr" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 56, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(0.0820377744456807, 0.4171172980326803)" | |
] | |
}, | |
"execution_count": 56, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"pearsonr(a,b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Spearman Rank\n", | |
"Tests whether two samples have a monotonic relationship. Observations need to be IID, able to be ranked. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 57, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import spearmanr" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 58, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"SpearmanrResult(correlation=0.09828266099415585, pvalue=0.3306388846208107)" | |
] | |
}, | |
"execution_count": 58, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"spearmanr(a,b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Kendall Rank\n", | |
"Tests whether two samples have a monotonic relationship. Observations need to be IID, able to be ranked. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 59, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import kendalltau" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 60, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"KendalltauResult(correlation=0.07047459480753636, pvalue=0.32412404755614377)" | |
] | |
}, | |
"execution_count": 60, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"kendalltau(a,b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## $\\chi^2$ Test\n", | |
"Tests whether two categorical variables are related or independent. Observations need to be independent, at least 25 examples in each cell." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 62, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import chi2_contingency" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 71, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"table = np.array([[40, 10],[30, 40]])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 72, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(15.062204081632652,\n", | |
" 0.00010402546661835286,\n", | |
" 1,\n", | |
" array([[29.16666667, 20.83333333],\n", | |
" [40.83333333, 29.16666667]]))" | |
] | |
}, | |
"execution_count": 72, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"chi2_contingency(table)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"toc-hr-collapsed": false | |
}, | |
"source": [ | |
"# Hypothesis Tests" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## T Test\n", | |
"Tests whether two samples' means are significantly different. Observations need to be independent, in each sample normal, in each sample same variance." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 73, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import ttest_ind" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 78, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"a = np.random.normal(loc=10, size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 84, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"b = np.random.normal(loc=11, size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 85, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Ttest_indResult(statistic=-9.211238980494853, pvalue=4.740751714730386e-17)" | |
] | |
}, | |
"execution_count": 85, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ttest_ind(a,b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Paired Student T Test\n", | |
"Tests whether two paired samples' means are significantly different. Observations need to be independent, in each sample normal, in each sample same variance. Observations are paired." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 87, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import ttest_rel" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 95, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"a = np.random.normal(loc=9.5, size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 96, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"b = np.random.normal(loc=10.1, size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 97, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Ttest_relResult(statistic=-4.265035140526937, pvalue=4.575138155519719e-05)" | |
] | |
}, | |
"execution_count": 97, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ttest_rel(a,b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Analysis of Variance ANOVA\n", | |
"Tests whether two or more samples' means are significantly different. Observations need to be independent, in each sample normal, in each sample same variance." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 98, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import f_oneway" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 102, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"a = np.random.normal(loc=3, size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 103, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"b = np.random.normal(loc=11, size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 104, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"F_onewayResult(statistic=3087.4306882609585, pvalue=9.854457388246251e-123)" | |
] | |
}, | |
"execution_count": 104, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"f_oneway(a,b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"toc-hr-collapsed": false | |
}, | |
"source": [ | |
"# Non-parametric Hypothesis Testing" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Mann-Whitney U Test\n", | |
"Tests whether the distributions of two samples are equal or not. Observations need to be independent, can be ranked." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 105, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import mannwhitneyu" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 115, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"a = np.random.randint(1,10,size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 116, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"b = np.random.normal(loc=0, size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 117, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"MannwhitneyuResult(statistic=161.0, pvalue=1.3099464346996372e-32)" | |
] | |
}, | |
"execution_count": 117, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"mannwhitneyu(a,b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Wilcoxon Signed Rank Test\n", | |
"Tests whether the distributions of two paired samples are equal or not. Observations need to be independent, can be ranked, are paired" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 118, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import wilcoxon" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 128, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"a = np.random.normal(loc=0,size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 129, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"b = np.random.normal(loc=0, size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 130, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"WilcoxonResult(statistic=2525.0, pvalue=1.0)" | |
] | |
}, | |
"execution_count": 130, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"wilcoxon(a,b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Kruskal-Wallis H Test\n", | |
"Tests whether the distributions of two or more samples are equal or not. Observations need to be independent, can be ranked." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 131, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import kruskal" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 145, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"a = np.random.normal(loc=0,size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 148, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"b = np.random.normal(loc=1.3, size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 149, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"KruskalResult(statistic=56.23140895522397, pvalue=6.442411772846654e-14)" | |
] | |
}, | |
"execution_count": 149, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"kruskal(a,b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"toc-hr-collapsed": false | |
}, | |
"source": [ | |
"## Friedman Test\n", | |
"Tests whether the distributions of two or more paired samples are equal or not. Observations need to be independent, can be ranked, are paired." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 154, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from scipy.stats import friedmanchisquare" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 155, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"a = np.random.normal(loc=0,size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 156, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"b = np.random.normal(loc=1.3, size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 158, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"c = np.random.normal(loc=1.3, size=100)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 159, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"FriedmanchisquareResult(statistic=52.460000000000036, pvalue=4.059342910977334e-12)" | |
] | |
}, | |
"execution_count": 159, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"friedmanchisquare(a,b,c)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.2" | |
}, | |
"toc-autonumbering": false, | |
"toc-showcode": false, | |
"toc-showmarkdowntxt": false | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment