Skip to content

Instantly share code, notes, and snippets.

@raddy
Created September 12, 2018 02:09
Show Gist options
  • Select an option

  • Save raddy/4084ade3d3a82911d43df9375f9697f4 to your computer and use it in GitHub Desktop.

Select an option

Save raddy/4084ade3d3a82911d43df9375f9697f4 to your computer and use it in GitHub Desktop.
Sample from a distribution with known skew and kurtosis
import statsmodels.sandbox.distributions.extras as extras
import scipy.interpolate as interpolate
import numpy as np
def generate_normal_four_moments(mu, sigma, skew, kurt, size=1000, sd_wide = 10):
f = extras.pdf_mvsk([mu, sigma, skew, kurt])
x = np.linspace(mu - sd_wide * sigma, mu + sd_wide * sigma, num=500)
y = [f(i) for i in x]
yy = np.cumsum(y) / np.sum(y)
inv_cdf = interpolate.interp1d(yy, x, fill_value="extrapolate")
rr = np.random.rand(size)
return inv_cdf(rr)
@willzzz97
Copy link
Copy Markdown

Thanks for sharing! Could you explain what is the parameter "sd_wide" used for?

@TheMightiestCarrot
Copy link
Copy Markdown

Could you please explain what is the parameter "sd_wide" used for?

@mattyTokenomics
Copy link
Copy Markdown

For anyone who comes across this that is looking to create custom distributions for the purposes of generating Brownian motion random walks and/or Monte Carlo simulations, I adapted this function and another source to make it very simple to generate N simulations for multiple different series that exhibit mean reversion, correlation, and/or non-normal distributions:
https://github.com/mattyTokenomics/brownian-motion-simulations

@troyscribner
Copy link
Copy Markdown

"Thanks for sharing! Could you explain what is the parameter "sd_wide" used for?"

He's using it in the np.linspace function, so basically he's creating evenly spaced numbers that are mean plus/minus 10 times variance, then applying the pdf_mvsk function on each point on that linspace.

@raddy
Copy link
Copy Markdown
Author

raddy commented Aug 6, 2023

just for anyone else who stumbles upon this. you're use of the term sigma is somewhat misleading. the function asks for v (variance). should probably be sigma ** 2.

this gist comes up on searches for use of pdf_mvsk so thought i'd point this out. @raddy, feel free to disagree.

I think you're correct looking at the pdf_mvsk implementation:

    N = len(mvsk)
    if N < 4:
        raise ValueError("Four moments must be given to "
                         "approximate the pdf.")

    mu, mc2, skew, kurt = mvsk

    totp = poly1d(1)
    sig = sqrt(mc2)
    if N > 2:
        Dvals = _hermnorm(N + 1)
        C3 = skew / 6.0
        C4 = kurt / 24.0
        # Note: Hermite polynomial for order 3 in _hermnorm is negative
        # instead of positive
        totp = totp - C3 * Dvals[3] + C4 * Dvals[4]

    def pdffunc(x):
        xn = (x - mu) / sig
        return totp(xn) * np.exp(-xn * xn / 2.0) / np.sqrt(2 * np.pi) / sig

    return pdffunc

Let me play with some numbers to sanity check!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment