Last active
June 19, 2019 08:30
-
-
Save peakBreaker/0b63985883b90eaa414702cb6799bcb6 to your computer and use it in GitHub Desktop.
Postprocessing multiple scikit learn models probabilities and predictions to a multilevel dataframe
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Pred and prob arrays are numpy array outputs from a sklearn model: | |
# - pred_array = model.predict(X).astype(int) | |
# - prob_arr = model.predict_proba(X) | |
# | |
# Here we run the inital data through multiple models and structure the | |
# model output into a multilevel dataframe for probabilities and predictions | |
# | |
# Typically the next stage would be to enhance the labels of numerical results | |
# to string/categories or similar basaed on whatever we want, aswell as providing | |
# the results to a database or something like that | |
prob_arr_m1 = model1.predict_proba(original_df) | |
prob_arr_m2 = model2.predict_proba(original_df) | |
prob_arr_m3 = model3.predict_proba(original_df) | |
pred_arr_m1 = model1.predict(original_df).astype(int) | |
pred_arr_m2 = model2.predict(original_df).astype(int) | |
pred_arr_m3 = model3.predict(original_df).astype(int) | |
# Stack the predictions | |
predictions = np.column_stack((pred_arr_m1, pred_arr_m2, pred_arr_m3)) | |
probabilities = np.column_stack((prob_arr_m1, prob_arr_m2, prob_arr_m3)) | |
# Create the multilevel index | |
probcols_raw = ['m1_prob1', 'm1_prob2', 'm2_prob1', 'm2_prob2', 'm2_prob3', 'm2_prob4', | |
'm2_prob5', 'm2_prob6', 'm3_prob1', 'm1_prob2'] | |
predcols_raw = ['m1_prediction', 'm2_prediction', 'm3_prediction'] | |
predcols = [s for s in map(lambda e: ('segments', e), predcols_raw)] | |
probcols = [p for p in map(lambda e: ('probabilities', e), probcols_raw)] | |
cols = pd.MultiIndex.from_tuples([*segcols, *probcols]) | |
# Converting to dataframe with multiindex | |
pred_df = pd.DataFrame(index=original_df.index, columns=cols) | |
pred_df['segments'] = predictions | |
pred_df['probabilities'] = probabilities |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ville kanskje ikke skrevet sklearn modell siden det er egentlig ikke gutta fra scikit-learn som har utviklet XGBoost biblioteket. Eller er multi-index dataframe en god ide her +1