-
-
Save karamanbk/8af50168240621516e5722e4196d1533 to your computer and use it in GitHub Desktop.
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from datetime import datetime, timedelta,date\n", | |
"import pandas as pd\n", | |
"%matplotlib inline\n", | |
"from sklearn.metrics import classification_report,confusion_matrix\n", | |
"import matplotlib.pyplot as plt\n", | |
"import numpy as np\n", | |
"import seaborn as sns\n", | |
"from __future__ import division\n", | |
"from sklearn.cluster import KMeans\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import warnings\n", | |
"warnings.filterwarnings(\"ignore\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import plotly.plotly as py\n", | |
"import plotly.offline as pyoff\n", | |
"import plotly.graph_objs as go" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.svm import SVC\n", | |
"from sklearn.multioutput import MultiOutputClassifier\n", | |
"from sklearn.ensemble import GradientBoostingClassifier\n", | |
"from sklearn.tree import DecisionTreeClassifier\n", | |
"from sklearn.neighbors import KNeighborsClassifier\n", | |
"from sklearn.naive_bayes import GaussianNB\n", | |
"from sklearn.ensemble import RandomForestClassifier\n", | |
"from sklearn.linear_model import LogisticRegression\n", | |
"import xgboost as xgb\n", | |
"from sklearn.model_selection import KFold, cross_val_score, train_test_split" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
" <script type=\"text/javascript\">\n", | |
" window.PlotlyConfig = {MathJaxConfig: 'local'};\n", | |
" if (window.MathJax) {MathJax.Hub.Config({SVG: {font: \"STIX-Web\"}});}\n", | |
" if (typeof require !== 'undefined') {\n", | |
" require.undef(\"plotly\");\n", | |
" define('plotly', function(require, exports, module) {\n", | |
" /**\n", | |
"* plotly.js v1.47.3\n", | |
"* Copyright 2012-2019, Plotly, Inc.\n", | |
"* All rights reserved.\n", | |
"* Licensed under the MIT license\n", | |
"*/\n", |
This is very much detailed and very informative.
Thank you for the great code and explanation, I would be really nice if we could have explore more of the ways that are possible to increase the model accuracy. As my models are not that accurate unfortunately.
Just one question, how can a 'Customer ID', which is a actually a categorical data, part of the training dataset? If 'Customer ID' is '12747.0', it does not make any sense in the training data, as it could be any other number like '435666666666' or 'ABCD' or '536TGK5'.
Now, if you remove the 'Customer ID' from training, how will you test on the test dataset by predicting which of the customers will buy in the next week or so?
Thank you for the great code and explanation, I would be really nice if we could have explore more of the ways that are possible to increase the model accuracy. As my models are not that accurate unfortunately.
There's probably an overfitting problem as the accuracy for the test set is way lower than that of the training set.
Accuracy of XGB classifier on training set: 0.92
Accuracy of XGB classifier on test set: 0.62
Awesome