Skip to content

Instantly share code, notes, and snippets.

@dyerrington
Last active September 5, 2020 00:48
Show Gist options
  • Select an option

  • Save dyerrington/dc0d1899bdd3c03b5999f1522c5bdc16 to your computer and use it in GitHub Desktop.

Select an option

Save dyerrington/dc0d1899bdd3c03b5999f1522c5bdc16 to your computer and use it in GitHub Desktop.
I can't tell you how many times I've plotted a roc curve for a multi-class problem from scratch. Too many times. I decided to make this gist to demonstrate how to implement a multi-class ROC (Receiver Operator Characteristic) plot in the most simple manner possible using Python.
## import any sklearn models and collect predictions / probabilities beforehand
import matplotlib.pyplot as plt
from cycler import cycler
## Line color config -- rather than create a structure with a finite color palette, use your own to cycle through a list.
default_cycler = (cycler(color=['r', 'g', 'b', 'y']) +
cycler(linestyle=['-', '--', ':', '-.']))
plt.rc('axes', prop_cycle = default_cycler)
## Set confusion metrics per class
fpr, tpr, thresh = {}, {}, {}
for index, class_name in enumerate(pipe_model.classes_):
fpr, tpr, threshold = roc_curve(y_encoded, y_hat_prob[:,index], pos_label=index)
plt.plot(fpr, tpr, label = f"Class - {class_name}")
plt.title('Multiclass ROC curve')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive rate')
plt.legend(loc='best')
@dyerrington
Copy link
Copy Markdown
Author

A slightly fancier version I use for plotting across in-sample (original + training), and out-of-sample:

## Line color config
default_cycler = (cycler(color=['r', 'g', 'b', 'y']) + 
                  cycler(linestyle=['-', '--', ':', '-.']))

plt.rc('axes', prop_cycle = default_cycler)

fig, ax = plt.subplots(nrows = 1, ncols = 3, figsize = (15, 5))
ax = ax.ravel()

## Plotting with different datasets
datasets = [
    dict(
        name                = "original",
        y                   = y_encoded,
        y_hat               = pipe_model.predict(df['sentence']),
        y_hat_probabilities = pipe_model.predict_proba(df['sentence'])
    ),
    dict(
        name                = "train",
        y                   = y_train,
        y_hat               = pipe_model.predict(X_train['sentence']),
        y_hat_probabilities = pipe_model.predict_proba(X_train['sentence'])
    ),
    dict(
        name                = "test",
        y                   = y_test,
        y_hat               = pipe_model.predict(X_test['sentence']),
        y_hat_probabilities = pipe_model.predict_proba(X_test['sentence'])
    ),
]

for index, data in enumerate(datasets):
    ## Set confusion metrics per class
    fpr, tpr, thresh = {}, {}, {}
    for class_index, class_name in enumerate(encoder.classes_):
        fpr, tpr, threshold = roc_curve(
            data['y'], data['y_hat_probabilities'][:,index], 
            pos_label = class_index
        )
        ax[index].plot(fpr, tpr, label = f"Class - {class_name}")
        ax[index].set_title(f"Multiclass ROC curve - {data['name']}")
        ax[index].set_xlabel('False Positive Rate')
        ax[index].set_ylabel('True Positive rate')
        ax[index].legend(loc='best')

image

@dyerrington
Copy link
Copy Markdown
Author

Also, encoder is an instance of sklearn.preprocessing.LabelEncoder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment