Created
May 10, 2020 15:35
-
-
Save spencerkclark/e8f9aed41acd8f524cef38706b3622b5 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# Exact decoding of times using timedelta arithmetic in cftime\n\nInteger arithmetic in cftime is microsecond-exact; this illustrates how that could be leveraged to decode times exactly in cases where the input array to `num2date` is of integer dtype or can be safely cast to an integer dtype. This method is about 9x slower than the current imprecise method used in `cftime.num2date`, but the benefit of exact results may outweigh this performance degredation cost." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "import datetime\n\nimport cftime\nimport numpy as np", | |
"execution_count": 1, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "# These dictionaries are just for illustration; I realize that \n# cftime supports other spellings of calendar types and units, \n# so the logic here would need to be adjusted accordingly.\nDATE_TYPES = {\n \"proleptic_gregorian\": cftime.DatetimeProlepticGregorian,\n \"noleap\": cftime.DatetimeNoLeap,\n \"allleap\": cftime.DatetimeAllLeap,\n \"julian\": cftime.DatetimeJulian,\n \"360_day\": cftime.Datetime360Day,\n \"gregorian\": cftime.DatetimeGregorian\n}\n\n\nUNIT_CONVERSION_FACTORS = {\n \"microseconds\": 1,\n \"milliseconds\": 1000,\n \"seconds\": 1000000,\n \"minutes\": 60 * 1000000,\n \"hours\": 3600 * 1000000,\n \"days\": 86400 * 1000000,\n}\n\n\ndef to_calendar_specific_datetime(datetime, calendar):\n return DATE_TYPES[calendar](\n datetime.year,\n datetime.month,\n datetime.day,\n datetime.hour,\n datetime.minute,\n datetime.second,\n datetime.microsecond\n )", | |
"execution_count": 2, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "def cast_to_int_if_safe(num):\n \"\"\"Copied from xarray.coding.times.py\"\"\"\n int_num = np.array(num, dtype=np.int64)\n if (num == int_num).all():\n num = int_num\n return num\n\n\ndef num2date_exact(times, units, calendar):\n \"\"\"Exact datetime decoding for integer times. This currently only\n supports use_only_cftime_datetimes=True, but could straightforwardly\n be made more general.\n \"\"\"\n times = cast_to_int_if_safe(times)\n if not np.issubdtype(times.dtype, np.integer):\n raise ValueError(f\"times must have integer dtype or be able to be safely cast to an integer dtype.\")\n \n unit, _ = cftime._cftime._datesplit(units)\n basedate = cftime._cftime._dateparse(units)\n basedate = to_calendar_specific_datetime(basedate, calendar)\n \n if unit not in UNIT_CONVERSION_FACTORS:\n raise ValueError(\"Unsupported units.\")\n\n factor = UNIT_CONVERSION_FACTORS[unit]\n times = times * factor\n timedeltas = times.astype(\"timedelta64[us]\").astype(datetime.timedelta)\n return basedate + timedeltas", | |
"execution_count": 3, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "UNITS = \"microseconds since 1900-01-01\"\nCALENDAR = \"proleptic_gregorian\"\ntimes = np.random.randint(0, 86400000000 * 1000000, size=(10000, ))", | |
"execution_count": 4, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "%timeit cftime.num2date(times, UNITS, CALENDAR)", | |
"execution_count": 5, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "113 ms ± 1.98 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "%timeit num2date_exact(times, UNITS, CALENDAR)", | |
"execution_count": 6, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "890 ms ± 5.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "imprecise = cftime.num2date(times, UNITS, CALENDAR)\nexact = num2date_exact(times, UNITS, CALENDAR)", | |
"execution_count": 7, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "imprecise", | |
"execution_count": 8, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 8, | |
"data": { | |
"text/plain": "array([cftime.DatetimeProlepticGregorian(3439-05-15 06:08:18.173117),\n cftime.DatetimeProlepticGregorian(4494-01-30 17:57:21.564247),\n cftime.DatetimeProlepticGregorian(2903-02-24 05:10:24.012762), ...,\n cftime.DatetimeProlepticGregorian(2984-05-04 23:04:46.247022),\n cftime.DatetimeProlepticGregorian(3444-12-29 16:19:06.306945),\n cftime.DatetimeProlepticGregorian(3315-11-19 03:12:16.589285)],\n dtype=object)" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "exact", | |
"execution_count": 9, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 9, | |
"data": { | |
"text/plain": "array([cftime.DatetimeProlepticGregorian(3439-05-15 06:08:18.173132),\n cftime.DatetimeProlepticGregorian(4494-01-30 17:57:21.564271),\n cftime.DatetimeProlepticGregorian(2903-02-24 05:10:24.012762), ...,\n cftime.DatetimeProlepticGregorian(2984-05-04 23:04:46.247007),\n cftime.DatetimeProlepticGregorian(3444-12-29 16:19:06.306935),\n cftime.DatetimeProlepticGregorian(3315-11-19 03:12:16.589290)],\n dtype=object)" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "", | |
"execution_count": null, | |
"outputs": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3", | |
"language": "python" | |
}, | |
"language_info": { | |
"name": "python", | |
"version": "3.7.3", | |
"mimetype": "text/x-python", | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"pygments_lexer": "ipython3", | |
"nbconvert_exporter": "python", | |
"file_extension": ".py" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Thanks for having a look @jswhit -- I only use numpy.timedelta64
here as a conduit to converting an array of integers to an array of datetime.timedelta
objects, and then add those datetime.timedelta
objects to the basedate
(which could be of any calendar type). For that reason I think it should be calendar-agnostic.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Since this uses numpy datetime64, does it only work for the proleptic gregorian calendar?