Calculating drift metrics for machine learning model is seemingly straight forward, yet surprisingly laborious and complex in practice. This library makes it simple and straight forward.
Most texts on calculting model drift focus on some specific metric to calculate, like Jensen-Shannon Distance or Chisquare. Many times there are some examples for single-variable datasets, explaining all the mathemetical details. That's great to learn abou the topic.
However, in practice, we have datasets with many features, of different types. Calculting one metric for one feature is one thing, calculating many metrics for many features and many datasets is quiet another.
Automation is needed. That's what this library provides.
In a nutshell, driftstats takes a baseline and a target dataframe and calculates multiple drift metrics for all columns, like PSI, KS, JSD, Chi2, etc. It normalizes each metric to a score between 0 and 1 and calculates a boolean drift indicator. As a result we get a dataframe of drift metrics. Predefined plot functions help us plot both the drift metrics and the baseline and target distributions of any one feature.
- Download all files
- pip install -r requirements.txt
- In Jupyter Lab open the driftstats NB for a tutorial
from driftstats import DriftStatistics
baseline = np.random.normal(0, 1, 1000)
target = np.random.normal(0.5, 1, 1000)
target2 = np.random.normal(0.5, 1.5, 1000)
calc = DriftStatistics()
drifts = calc.compare(baseline, target)
calc.plot_drift(drifts)