-
-
Save josef-pkt/29ad2116e9af0864e5100ded89efe1f5 to your computer and use it in GitHub Desktop.
The docstring says list of dict
, because splines could have several variables and we need several splines. Try brackets [ ]
around the dict.
You can inspect the bsplines instance to see how the knots where set.
Note: in Bsplines
everything is in list of splines, e.g. from above bs = BSplines(x_spline, df=[12, 10], degree=[3, 3])
AFAICS, [s.knots for s in bs.smoothers]
should show the knots for each univariate bspline.
Thanks for this example. I'm an R mgcv
user looking for equivalents in Python; your package is the best I've found so far.
I have two questions about bs = BSplines(x_spline, df=[12, 10], degree=[3, 3])
:
- Is
df
the maximum number of knots for a spline, like thek
parameter ins(weight, k=12)
in anmgcv
formula? - What is
degree
? The documentation says "degree(s) of the spline; the same length and type rules apply as to df" but I don't understand what that means. What would themgcv
equivalent be?
In general, the splines were based on the patsy definition and implementation, more information there https://patsy.readthedocs.io/en/latest/spline-regression.html
The main change that we made to the definition of splines is to add additional options for boundary knots to match mgcv.
I don't really remember the details
df is likely the number of implied basis function, i.e. number of columns after dropping a column for implicit constant.
I never remember "degree" versus "order" of polynomials, one is the highest power, the other is the number of terms.
It looks like degree=3 is the standard cubic bspline.
examples are in the unit tests and the unit tests were written to match mgcv (as far as possible)
checking briefly: df is k in mgcv (based on Poisson B-spline example
The knot location was difficult to match up between patsy/statsmodels and mgcv (I guess to remove ambiguity with knot options)
e.g. statsmodels\gam\tests\results\results_mpg_bs_poisson.r forces R to use the same knots as we have.
Thanks for the link to Patsy. That's a really useful package!
From the link, I note:
In patsy one can specify the number of degrees of freedom directly (actual number of columns of the resulting design matrix) whereas in mgcv one has to specify the number of knots to use. For instance, in the case of cyclic regression splines (with no additional constraints) the actual degrees of freedom is the number of knots minus one.
So, it seems that df
(statsmodels
) is k
(mgcv
) minus one.
Also, it seems that you're right about degree=3
meaning cubic splines:
bs() can produce B-spline bases of arbitrary degrees – e.g., degree=0 will give produce piecewise-constant functions, degree=1 will produce piecewise-linear functions, and the default degree=3 produces cubic splines.
To be honest, I really don't understand the mathematics behind splines and all that, but at least with this information, I can line up your documentation with mgcv
's. Thanks!
Hello, thanks for this valuable resource on how to use GAM in Python. I would like to know how I should specify knot locations.
I tried bs.knot_kwds={'knots':(num1,num2)}
It runs without error. But, is this the correct way?