Last active
May 3, 2022 13:03
-
-
Save pietroppeter/d36d5f40ab3bb5cd6879b713d352d3ba to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Goal\n", | |
"Reproduce locally the work that is done in [M5 example](https://colab.research.google.com/drive/1pmp4rqiwiPL-ambxTrJGBiNMS-7vm3v6) colab notebook.\n", | |
"- avoid requirement of s3 storage\n", | |
"- avoid requirement of access to AutoTS API\n", | |
"\n", | |
"the result should be a local way to compute a best-in-class M5 forecast using [nixtla](https://github.com/Nixtla/nixtla) open source libraries." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Data\n", | |
"\n", | |
"- **target**: time-series variable of interest. Must have three columns: unique_id, datestamp and value.\n", | |
"- **static**: exogenous static features for each unique_id. Must have unique_id and features in columns.\n", | |
"- **temporal**: exogenous temporal features. Must have unique_id, datestamp and values for each feature.\n", | |
"- **calendar-holidays**: dictionary with holiday name and dates with occurrences\n", | |
"\n", | |
"Data is taken from https://github.com/Nixtla/nixtla/tree/main/sdk/python-autotimeseries/data" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"data_dir = \"data\"\n", | |
"filename_target = f\"{data_dir}/target.parquet\"\n", | |
"filename_static = f\"{data_dir}/static.parquet\"\n", | |
"filename_temporal = f\"{data_dir}/temporal.parquet\"\n", | |
"filename_calendar_holidays = f\"{data_dir}/calendar-holidays.txt\"\n", | |
"\n", | |
"# outputs:\n", | |
"filename_calendar_features = f\"{data_dir}/calendar-features.parquet\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"with open(filename_calendar_holidays) as f:\n", | |
" calendar_holidays_raw = f.read()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"\"Chanukah_End=2011-12-28,2012-12-16,2013-12-05,2014-12-24,2015-12-14/Christmas=2011-12-25,2012-12-25,2013-12-25,2014-12-25,2015-12-25/Cinco_De_Mayo=2011-05-05,2012-05-05,2013-05-05,2014-05-05,2015-05-05,2016-05-05/ColumbusDay=2011-10-10,2012-10-08,2013-10-14,2014-10-13,2015-10-12/Easter=2011-04-24,2012-04-08,2013-03-31,2014-04-20,2015-04-05,2016-03-27/Eid_al-Fitr=2011-08-31,2012-08-19,2013-08-08,2014-07-29,2015-07-18/EidAlAdha=2011-11-07,2012-10-26,2013-10-15,2014-10-04,2015-09-24/Father's_day=2011-06-19,2012-06-17,2013-06-16,2014-06-15,2015-06-21,2016-06-19/Halloween=2011-10-31,2012-10-31,2013-10-31,2014-10-31,2015-10-31/IndependenceDay=2011-07-04,2012-07-04,2013-07-04,2014-07-04,2015-07-04/LaborDay=2011-09-05,2012-09-03,2013-09-02,2014-09-01,2015-09-07/LentStart=2011-03-09,2012-02-22,2013-02-13,2014-03-05,2015-02-18,2016-02-10/LentWeek2=2011-03-16,2012-02-29,2013-02-20,2014-03-12,2015-02-25,2016-02-17/MartinLutherKingDay=2012-01-16,2013-01-21,2014-01-20,2015-01-19,2016-01-18/MemorialDay=2011-05-30,2012-05-28,2013-05-27,2014-05-26,2015-05-25,2016-05-30/Mother's_day=2011-05-08,2012-05-13,2013-05-12,2014-05-11,2015-05-10,2016-05-08/NBAFinalsEnd=2011-06-12,2012-06-21,2013-06-20,2014-06-15,2015-06-16,2016-06-19/NBAFinalsStart=2011-05-31,2012-06-12,2013-06-06,2014-06-05,2015-06-04,2016-06-02/NewYear=2012-01-01,2013-01-01,2014-01-01,2015-01-01,2016-01-01/OrthodoxChristmas=2012-01-07,2013-01-07,2014-01-07,2015-01-07,2016-01-07/OrthodoxEaster=2011-04-24,2012-04-15,2013-05-05,2014-04-20,2015-04-12,2016-05-01/Pesach_End=2011-04-26,2012-04-14,2013-04-02,2014-04-22,2015-04-11,2016-04-30/PresidentsDay=2011-02-21,2012-02-20,2013-02-18,2014-02-17,2015-02-16,2016-02-15/Purim_End=2011-03-20,2012-03-08,2013-02-24,2014-03-16,2015-03-05,2016-03-24/Ramadan_starts=2011-08-01,2012-07-20,2013-07-09,2014-06-29,2015-06-18,2016-06-07/StPatricksDay=2011-03-17,2012-03-17,2013-03-17,2014-03-17,2015-03-17,2016-03-17/SuperBowl=2011-02-06,2012-02-05,2013-02-03,2014-02-02,2015-02-01,2016-02-07/Thanksgiving=2011-11-24,2012-11-22,2013-11-28,2014-11-27,2015-11-26/ValentinesDay=2011-02-14,2012-02-14,2013-02-14,2014-02-14,2015-02-14,2016-02-14/VeteransDay=2011-11-11,2012-11-11,2013-11-11,2014-11-11,2015-11-11\"" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"calendar_holidays_raw" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import pandas as pd" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"target = pd.read_parquet(filename_target)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>item_id</th>\n", | |
" <th>timestamp</th>\n", | |
" <th>demand</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>FOODS_1_001_CA_1</td>\n", | |
" <td>2011-01-29</td>\n", | |
" <td>3.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>FOODS_1_001_CA_1</td>\n", | |
" <td>2011-01-30</td>\n", | |
" <td>0.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>FOODS_1_001_CA_1</td>\n", | |
" <td>2011-01-31</td>\n", | |
" <td>0.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>FOODS_1_001_CA_1</td>\n", | |
" <td>2011-02-01</td>\n", | |
" <td>1.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>FOODS_1_001_CA_1</td>\n", | |
" <td>2011-02-02</td>\n", | |
" <td>4.0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" item_id timestamp demand\n", | |
"0 FOODS_1_001_CA_1 2011-01-29 3.0\n", | |
"1 FOODS_1_001_CA_1 2011-01-30 0.0\n", | |
"2 FOODS_1_001_CA_1 2011-01-31 0.0\n", | |
"3 FOODS_1_001_CA_1 2011-02-01 1.0\n", | |
"4 FOODS_1_001_CA_1 2011-02-02 4.0" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"target.head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<class 'pandas.core.frame.DataFrame'>\n", | |
"RangeIndex: 46796220 entries, 0 to 46796219\n", | |
"Data columns (total 3 columns):\n", | |
" # Column Dtype \n", | |
"--- ------ ----- \n", | |
" 0 item_id category \n", | |
" 1 timestamp datetime64[ns]\n", | |
" 2 demand float32 \n", | |
"dtypes: category(1), datetime64[ns](1), float32(1)\n", | |
"memory usage: 626.3 MB\n" | |
] | |
} | |
], | |
"source": [ | |
"target.info()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"30490" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(target.item_id.unique())" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>item_id</th>\n", | |
" <th>timestamp</th>\n", | |
" <th>snap_CA</th>\n", | |
" <th>snap_TX</th>\n", | |
" <th>snap_WI</th>\n", | |
" <th>sell_price</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>FOODS_1_001_CA_1</td>\n", | |
" <td>2011-01-29</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>2.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>FOODS_1_001_CA_1</td>\n", | |
" <td>2011-01-30</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>2.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>FOODS_1_001_CA_1</td>\n", | |
" <td>2011-01-31</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>2.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>FOODS_1_001_CA_1</td>\n", | |
" <td>2011-02-01</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>2.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>FOODS_1_001_CA_1</td>\n", | |
" <td>2011-02-02</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" item_id timestamp snap_CA snap_TX snap_WI sell_price\n", | |
"0 FOODS_1_001_CA_1 2011-01-29 0 0 0 2.0\n", | |
"1 FOODS_1_001_CA_1 2011-01-30 0 0 0 2.0\n", | |
"2 FOODS_1_001_CA_1 2011-01-31 0 0 0 2.0\n", | |
"3 FOODS_1_001_CA_1 2011-02-01 1 1 0 2.0\n", | |
"4 FOODS_1_001_CA_1 2011-02-02 1 0 1 2.0" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"temporal = pd.read_parquet(filename_temporal)\n", | |
"temporal.head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<class 'pandas.core.frame.DataFrame'>\n", | |
"RangeIndex: 47649940 entries, 0 to 47649939\n", | |
"Data columns (total 6 columns):\n", | |
" # Column Dtype \n", | |
"--- ------ ----- \n", | |
" 0 item_id category \n", | |
" 1 timestamp datetime64[ns]\n", | |
" 2 snap_CA uint8 \n", | |
" 3 snap_TX uint8 \n", | |
" 4 snap_WI uint8 \n", | |
" 5 sell_price float32 \n", | |
"dtypes: category(1), datetime64[ns](1), float32(1), uint8(3)\n", | |
"memory usage: 774.0 MB\n" | |
] | |
} | |
], | |
"source": [ | |
"temporal.info()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Calendar features\n", | |
"\n", | |
"code called in autotimeseries for tsfeatures is https://github.com/Nixtla/nixtla/blob/main/tsfeatures/features/make_features.py\n", | |
"which depends on ts_features (also mlforecast, tsfresh). _not actually used directly in the colab notebook_\n", | |
"\n", | |
"code called in autotimeseries for calendartsfeatures is https://github.com/Nixtla/nixtla/blob/main/tsfeatures/calendar/make_holidays.py\n", | |
"which depends on holidays. made a copy of this in a `src` folder" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from src.make_holidays import CalendarFeatures" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from rich import inspect" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #000080; text-decoration-color: #000080\">╭────────────────────── </span><span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\"><</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff; font-weight: bold\">class</span><span style=\"color: #000000; text-decoration-color: #000000\"> </span><span style=\"color: #008000; text-decoration-color: #008000\">'src.make_holidays.CalendarFeatures'</span><span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">></span><span style=\"color: #000080; text-decoration-color: #000080\"> ───────────────────────╮</span>\n", | |
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff; font-style: italic\">class </span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">CalendarFeatures</span><span style=\"font-weight: bold\">(</span>filename: str, filename_output: str, country: str, events: <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n", | |
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> Dict<span style=\"font-weight: bold\">[</span>str, List<span style=\"font-weight: bold\">[</span>str<span style=\"font-weight: bold\">]]</span>, scale: bool, unique_id_column: str, ds_column: str, y_column: str<span style=\"font-weight: bold\">)</span> <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n", | |
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> -> <span style=\"color: #008000; text-decoration-color: #008000\">'CalendarFeatures'</span>: <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n", | |
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n", | |
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080\">Computes calendar features.</span> <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n", | |
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n", | |
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">27</span><span style=\"font-style: italic\"> attribute(s) not shown.</span> Run <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">inspect</span><span style=\"font-weight: bold\">(</span>inspect<span style=\"font-weight: bold\">)</span> for options. <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n", | |
"<span style=\"color: #000080; text-decoration-color: #000080\">╰───────────────────────────────────────────────────────────────────────────────────────────╯</span>\n", | |
"</pre>\n" | |
], | |
"text/plain": [ | |
"\u001b[34m╭─\u001b[0m\u001b[34m───────────────────── \u001b[0m\u001b[1;34m<\u001b[0m\u001b[1;95mclass\u001b[0m\u001b[39m \u001b[0m\u001b[32m'src.make_holidays.CalendarFeatures'\u001b[0m\u001b[1;34m>\u001b[0m\u001b[34m ──────────────────────\u001b[0m\u001b[34m─╮\u001b[0m\n", | |
"\u001b[34m│\u001b[0m \u001b[3;96mclass \u001b[0m\u001b[1;31mCalendarFeatures\u001b[0m\u001b[1m(\u001b[0mfilename: str, filename_output: str, country: str, events: \u001b[34m│\u001b[0m\n", | |
"\u001b[34m│\u001b[0m Dict\u001b[1m[\u001b[0mstr, List\u001b[1m[\u001b[0mstr\u001b[1m]\u001b[0m\u001b[1m]\u001b[0m, scale: bool, unique_id_column: str, ds_column: str, y_column: str\u001b[1m)\u001b[0m \u001b[34m│\u001b[0m\n", | |
"\u001b[34m│\u001b[0m -> \u001b[32m'CalendarFeatures'\u001b[0m: \u001b[34m│\u001b[0m\n", | |
"\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", | |
"\u001b[34m│\u001b[0m \u001b[36mComputes calendar features.\u001b[0m \u001b[34m│\u001b[0m\n", | |
"\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", | |
"\u001b[34m│\u001b[0m \u001b[1;36m27\u001b[0m\u001b[3m attribute(s) not shown.\u001b[0m Run \u001b[1;35minspect\u001b[0m\u001b[1m(\u001b[0minspect\u001b[1m)\u001b[0m for options. \u001b[34m│\u001b[0m\n", | |
"\u001b[34m╰───────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"inspect(CalendarFeatures)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"this is the call that we are trying to reproduce locally\n", | |
"\n", | |
"```python\n", | |
"response_calendar = autotimeseries.calendartsfeatures(filename=filename_temporal,\n", | |
" country='USA',\n", | |
" events=filename_calendar_holidays,\n", | |
" **columns)\n", | |
"```\n", | |
"\n", | |
"from [autotimeseries.api.main#L73](https://github.com/Nixtla/nixtla/blob/74e4560f1bdb6bf64445f3c45005fe74c0a0a427/api/main.py#L73):\n", | |
"\n", | |
"```python\n", | |
"@app.post('/calendartsfeatures')\n", | |
"def compute_calendartsfeatures(s3_args: S3Args, args: CalendarTSFeaturesArgs):\n", | |
" \"\"\"Calculates features using sagemaker.\"\"\"\n", | |
" sagemaker_response = run_sagemaker(url=s3_args.s3_url,\n", | |
" dest_url=s3_args.s3_dest_url,\n", | |
" output_name=f'calendar-features.csv',\n", | |
" script='calendar/make_holidays.py',\n", | |
" arguments=parse_args(args))\n", | |
"\n", | |
" return sagemaker_response\n", | |
"```\n", | |
"\n", | |
"note that `events` comes from calendar_holidays but needs to be processed as a `Dict[str, list]`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'Chanukah_End': ['2011-12-28',\n", | |
" '2012-12-16',\n", | |
" '2013-12-05',\n", | |
" '2014-12-24',\n", | |
" '2015-12-14'],\n", | |
" 'Christmas': ['2011-12-25',\n", | |
" '2012-12-25',\n", | |
" '2013-12-25',\n", | |
" '2014-12-25',\n", | |
" '2015-12-25'],\n", | |
" 'Cinco_De_Mayo': ['2011-05-05',\n", | |
" '2012-05-05',\n", | |
" '2013-05-05',\n", | |
" '2014-05-05',\n", | |
" '2015-05-05',\n", | |
" '2016-05-05'],\n", | |
" 'ColumbusDay': ['2011-10-10',\n", | |
" '2012-10-08',\n", | |
" '2013-10-14',\n", | |
" '2014-10-13',\n", | |
" '2015-10-12'],\n", | |
" 'Easter': ['2011-04-24',\n", | |
" '2012-04-08',\n", | |
" '2013-03-31',\n", | |
" '2014-04-20',\n", | |
" '2015-04-05',\n", | |
" '2016-03-27'],\n", | |
" 'Eid_al-Fitr': ['2011-08-31',\n", | |
" '2012-08-19',\n", | |
" '2013-08-08',\n", | |
" '2014-07-29',\n", | |
" '2015-07-18'],\n", | |
" 'EidAlAdha': ['2011-11-07',\n", | |
" '2012-10-26',\n", | |
" '2013-10-15',\n", | |
" '2014-10-04',\n", | |
" '2015-09-24'],\n", | |
" \"Father's_day\": ['2011-06-19',\n", | |
" '2012-06-17',\n", | |
" '2013-06-16',\n", | |
" '2014-06-15',\n", | |
" '2015-06-21',\n", | |
" '2016-06-19'],\n", | |
" 'Halloween': ['2011-10-31',\n", | |
" '2012-10-31',\n", | |
" '2013-10-31',\n", | |
" '2014-10-31',\n", | |
" '2015-10-31'],\n", | |
" 'IndependenceDay': ['2011-07-04',\n", | |
" '2012-07-04',\n", | |
" '2013-07-04',\n", | |
" '2014-07-04',\n", | |
" '2015-07-04'],\n", | |
" 'LaborDay': ['2011-09-05',\n", | |
" '2012-09-03',\n", | |
" '2013-09-02',\n", | |
" '2014-09-01',\n", | |
" '2015-09-07'],\n", | |
" 'LentStart': ['2011-03-09',\n", | |
" '2012-02-22',\n", | |
" '2013-02-13',\n", | |
" '2014-03-05',\n", | |
" '2015-02-18',\n", | |
" '2016-02-10'],\n", | |
" 'LentWeek2': ['2011-03-16',\n", | |
" '2012-02-29',\n", | |
" '2013-02-20',\n", | |
" '2014-03-12',\n", | |
" '2015-02-25',\n", | |
" '2016-02-17'],\n", | |
" 'MartinLutherKingDay': ['2012-01-16',\n", | |
" '2013-01-21',\n", | |
" '2014-01-20',\n", | |
" '2015-01-19',\n", | |
" '2016-01-18'],\n", | |
" 'MemorialDay': ['2011-05-30',\n", | |
" '2012-05-28',\n", | |
" '2013-05-27',\n", | |
" '2014-05-26',\n", | |
" '2015-05-25',\n", | |
" '2016-05-30'],\n", | |
" \"Mother's_day\": ['2011-05-08',\n", | |
" '2012-05-13',\n", | |
" '2013-05-12',\n", | |
" '2014-05-11',\n", | |
" '2015-05-10',\n", | |
" '2016-05-08'],\n", | |
" 'NBAFinalsEnd': ['2011-06-12',\n", | |
" '2012-06-21',\n", | |
" '2013-06-20',\n", | |
" '2014-06-15',\n", | |
" '2015-06-16',\n", | |
" '2016-06-19'],\n", | |
" 'NBAFinalsStart': ['2011-05-31',\n", | |
" '2012-06-12',\n", | |
" '2013-06-06',\n", | |
" '2014-06-05',\n", | |
" '2015-06-04',\n", | |
" '2016-06-02'],\n", | |
" 'NewYear': ['2012-01-01',\n", | |
" '2013-01-01',\n", | |
" '2014-01-01',\n", | |
" '2015-01-01',\n", | |
" '2016-01-01'],\n", | |
" 'OrthodoxChristmas': ['2012-01-07',\n", | |
" '2013-01-07',\n", | |
" '2014-01-07',\n", | |
" '2015-01-07',\n", | |
" '2016-01-07'],\n", | |
" 'OrthodoxEaster': ['2011-04-24',\n", | |
" '2012-04-15',\n", | |
" '2013-05-05',\n", | |
" '2014-04-20',\n", | |
" '2015-04-12',\n", | |
" '2016-05-01'],\n", | |
" 'Pesach_End': ['2011-04-26',\n", | |
" '2012-04-14',\n", | |
" '2013-04-02',\n", | |
" '2014-04-22',\n", | |
" '2015-04-11',\n", | |
" '2016-04-30'],\n", | |
" 'PresidentsDay': ['2011-02-21',\n", | |
" '2012-02-20',\n", | |
" '2013-02-18',\n", | |
" '2014-02-17',\n", | |
" '2015-02-16',\n", | |
" '2016-02-15'],\n", | |
" 'Purim_End': ['2011-03-20',\n", | |
" '2012-03-08',\n", | |
" '2013-02-24',\n", | |
" '2014-03-16',\n", | |
" '2015-03-05',\n", | |
" '2016-03-24'],\n", | |
" 'Ramadan_starts': ['2011-08-01',\n", | |
" '2012-07-20',\n", | |
" '2013-07-09',\n", | |
" '2014-06-29',\n", | |
" '2015-06-18',\n", | |
" '2016-06-07'],\n", | |
" 'StPatricksDay': ['2011-03-17',\n", | |
" '2012-03-17',\n", | |
" '2013-03-17',\n", | |
" '2014-03-17',\n", | |
" '2015-03-17',\n", | |
" '2016-03-17'],\n", | |
" 'SuperBowl': ['2011-02-06',\n", | |
" '2012-02-05',\n", | |
" '2013-02-03',\n", | |
" '2014-02-02',\n", | |
" '2015-02-01',\n", | |
" '2016-02-07'],\n", | |
" 'Thanksgiving': ['2011-11-24',\n", | |
" '2012-11-22',\n", | |
" '2013-11-28',\n", | |
" '2014-11-27',\n", | |
" '2015-11-26'],\n", | |
" 'ValentinesDay': ['2011-02-14',\n", | |
" '2012-02-14',\n", | |
" '2013-02-14',\n", | |
" '2014-02-14',\n", | |
" '2015-02-14',\n", | |
" '2016-02-14'],\n", | |
" 'VeteransDay': ['2011-11-11',\n", | |
" '2012-11-11',\n", | |
" '2013-11-11',\n", | |
" '2014-11-11',\n", | |
" '2015-11-11']}" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"calendar_holidays = {e[0]: e[1].split(\",\") for e in [event.split(\"=\") for event in calendar_holidays_raw.split(\"/\")]}\n", | |
"calendar_holidays" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"had to put this in make_holidays (otherwise it cannot be used as a library):\n", | |
"```python\n", | |
"import logging\n", | |
"logging.basicConfig(level=logging.INFO)\n", | |
"logger = logging.getLogger(__name__)\n", | |
"```\n", | |
"\n", | |
"also had to change the references to directory \"/opt/ml/processing/output/\" in `reader` and `writer`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"INFO:src.make_holidays:Reading file...\n", | |
"INFO:src.make_holidays:File read.\n" | |
] | |
} | |
], | |
"source": [ | |
"calendarfeatures = CalendarFeatures(\n", | |
" filename=filename_temporal,\n", | |
" filename_output=filename_calendar_features,\n", | |
" country=\"USA\",\n", | |
" events=calendar_holidays,\n", | |
" scale=False,\n", | |
" unique_id_column=\"item_id\",\n", | |
" ds_column=\"timestamp\",\n", | |
" y_column=\"\" # not used, removed in make_holidays code the only occurence (in renamer)\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"INFO:src.make_holidays:Computing features...\n" | |
] | |
}, | |
{ | |
"ename": "UFuncTypeError", | |
"evalue": "Cannot cast ufunc 'greater' input 0 from dtype('<m8[ns]') to dtype('<m8') with casting rule 'same_kind'", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[1;31mUFuncTypeError\u001b[0m Traceback (most recent call last)", | |
"\u001b[1;32m<ipython-input-16-2c703a4c6e68>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mcalendarfeatures\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_calendar_features\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", | |
"\u001b[1;32mP:\\Development\\ppeterlongo\\m5\\nixtla\\src\\make_holidays.py\u001b[0m in \u001b[0;36mget_calendar_features\u001b[1;34m(self)\u001b[0m\n\u001b[0;32m 203\u001b[0m \u001b[0myear_list\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mlist\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrange\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmin_year\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmax_year\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 204\u001b[0m \u001b[0mcountry\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcountry\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 205\u001b[1;33m events=self.events)\n\u001b[0m\u001b[0;32m 206\u001b[0m \u001b[1;31m# hack, it should be an argument\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 207\u001b[0m \u001b[0mholidays\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m(\u001b[0m\u001b[0mholidays\u001b[0m \u001b[1;33m==\u001b[0m \u001b[1;36m0\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mastype\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mint\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", | |
"\u001b[1;32mP:\\Development\\ppeterlongo\\m5\\nixtla\\src\\make_holidays.py\u001b[0m in \u001b[0;36mmake_holidays_distance_df\u001b[1;34m(dates, year_list, country, events)\u001b[0m\n\u001b[0;32m 131\u001b[0m \u001b[0mholiday_dates\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mholiday_dates\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtolist\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 132\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 133\u001b[1;33m \u001b[0mdistance_dict\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mholiday\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mdistance_to_holiday\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mholiday_dates\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdates\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 134\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 135\u001b[0m \u001b[0mholidays_distance_df\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mDataFrame\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdistance_dict\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", | |
"\u001b[1;32mP:\\Development\\ppeterlongo\\m5\\nixtla\\src\\make_holidays.py\u001b[0m in \u001b[0;36mdistance_to_holiday\u001b[1;34m(holiday_dates, dates)\u001b[0m\n\u001b[0;32m 101\u001b[0m \u001b[0mdistance\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mabs\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdistance\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 102\u001b[0m \u001b[0mdistance\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mmin\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdistance\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 103\u001b[1;33m \u001b[0mdistance\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mdistance\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m183\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m365\u001b[0m \u001b[1;33m-\u001b[0m \u001b[0mdistance\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mdistance\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m183\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 104\u001b[0m \u001b[1;31m# Convert to minutes\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 105\u001b[0m \u001b[0mdistance\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mdistance\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mastype\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfloat\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", | |
"\u001b[1;31mUFuncTypeError\u001b[0m: Cannot cast ufunc 'greater' input 0 from dtype('<m8[ns]') to dtype('<m8') with casting rule 'same_kind'" | |
] | |
} | |
], | |
"source": [ | |
"calendarfeatures.get_calendar_features()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"known issue (blocker): https://github.com/Nixtla/nixtla/issues/15" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python [conda env:m5]", | |
"language": "python", | |
"name": "conda-env-m5-py" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment