Skip to content

Instantly share code, notes, and snippets.

@josef-pkt
Created January 30, 2019 00:50
Show Gist options
  • Save josef-pkt/29ad2116e9af0864e5100ded89efe1f5 to your computer and use it in GitHub Desktop.
Save josef-pkt/29ad2116e9af0864e5100ded89efe1f5 to your computer and use it in GitHub Desktop.
Basic GAM example with formula after merge
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@josef-pkt
Copy link
Author

In general, the splines were based on the patsy definition and implementation, more information there https://patsy.readthedocs.io/en/latest/spline-regression.html

The main change that we made to the definition of splines is to add additional options for boundary knots to match mgcv.

I don't really remember the details
df is likely the number of implied basis function, i.e. number of columns after dropping a column for implicit constant.

I never remember "degree" versus "order" of polynomials, one is the highest power, the other is the number of terms.
It looks like degree=3 is the standard cubic bspline.

examples are in the unit tests and the unit tests were written to match mgcv (as far as possible)
checking briefly: df is k in mgcv (based on Poisson B-spline example
The knot location was difficult to match up between patsy/statsmodels and mgcv (I guess to remove ambiguity with knot options)
e.g. statsmodels\gam\tests\results\results_mpg_bs_poisson.r forces R to use the same knots as we have.

@tripartio
Copy link

Thanks for the link to Patsy. That's a really useful package!

From the link, I note:

In patsy one can specify the number of degrees of freedom directly (actual number of columns of the resulting design matrix) whereas in mgcv one has to specify the number of knots to use. For instance, in the case of cyclic regression splines (with no additional constraints) the actual degrees of freedom is the number of knots minus one.

So, it seems that df (statsmodels) is k (mgcv) minus one.

Also, it seems that you're right about degree=3 meaning cubic splines:

bs() can produce B-spline bases of arbitrary degrees – e.g., degree=0 will give produce piecewise-constant functions, degree=1 will produce piecewise-linear functions, and the default degree=3 produces cubic splines.

To be honest, I really don't understand the mathematics behind splines and all that, but at least with this information, I can line up your documentation with mgcv's. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment