Skip to content

Instantly share code, notes, and snippets.

@spencerkclark
Created May 10, 2020 15:35
Show Gist options
  • Save spencerkclark/e8f9aed41acd8f524cef38706b3622b5 to your computer and use it in GitHub Desktop.
Save spencerkclark/e8f9aed41acd8f524cef38706b3622b5 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Exact decoding of times using timedelta arithmetic in cftime\n\nInteger arithmetic in cftime is microsecond-exact; this illustrates how that could be leveraged to decode times exactly in cases where the input array to `num2date` is of integer dtype or can be safely cast to an integer dtype. This method is about 9x slower than the current imprecise method used in `cftime.num2date`, but the benefit of exact results may outweigh this performance degredation cost."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import datetime\n\nimport cftime\nimport numpy as np",
"execution_count": 1,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# These dictionaries are just for illustration; I realize that \n# cftime supports other spellings of calendar types and units, \n# so the logic here would need to be adjusted accordingly.\nDATE_TYPES = {\n \"proleptic_gregorian\": cftime.DatetimeProlepticGregorian,\n \"noleap\": cftime.DatetimeNoLeap,\n \"allleap\": cftime.DatetimeAllLeap,\n \"julian\": cftime.DatetimeJulian,\n \"360_day\": cftime.Datetime360Day,\n \"gregorian\": cftime.DatetimeGregorian\n}\n\n\nUNIT_CONVERSION_FACTORS = {\n \"microseconds\": 1,\n \"milliseconds\": 1000,\n \"seconds\": 1000000,\n \"minutes\": 60 * 1000000,\n \"hours\": 3600 * 1000000,\n \"days\": 86400 * 1000000,\n}\n\n\ndef to_calendar_specific_datetime(datetime, calendar):\n return DATE_TYPES[calendar](\n datetime.year,\n datetime.month,\n datetime.day,\n datetime.hour,\n datetime.minute,\n datetime.second,\n datetime.microsecond\n )",
"execution_count": 2,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "def cast_to_int_if_safe(num):\n \"\"\"Copied from xarray.coding.times.py\"\"\"\n int_num = np.array(num, dtype=np.int64)\n if (num == int_num).all():\n num = int_num\n return num\n\n\ndef num2date_exact(times, units, calendar):\n \"\"\"Exact datetime decoding for integer times. This currently only\n supports use_only_cftime_datetimes=True, but could straightforwardly\n be made more general.\n \"\"\"\n times = cast_to_int_if_safe(times)\n if not np.issubdtype(times.dtype, np.integer):\n raise ValueError(f\"times must have integer dtype or be able to be safely cast to an integer dtype.\")\n \n unit, _ = cftime._cftime._datesplit(units)\n basedate = cftime._cftime._dateparse(units)\n basedate = to_calendar_specific_datetime(basedate, calendar)\n \n if unit not in UNIT_CONVERSION_FACTORS:\n raise ValueError(\"Unsupported units.\")\n\n factor = UNIT_CONVERSION_FACTORS[unit]\n times = times * factor\n timedeltas = times.astype(\"timedelta64[us]\").astype(datetime.timedelta)\n return basedate + timedeltas",
"execution_count": 3,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "UNITS = \"microseconds since 1900-01-01\"\nCALENDAR = \"proleptic_gregorian\"\ntimes = np.random.randint(0, 86400000000 * 1000000, size=(10000, ))",
"execution_count": 4,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "%timeit cftime.num2date(times, UNITS, CALENDAR)",
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"text": "113 ms ± 1.98 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "%timeit num2date_exact(times, UNITS, CALENDAR)",
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"text": "890 ms ± 5.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "imprecise = cftime.num2date(times, UNITS, CALENDAR)\nexact = num2date_exact(times, UNITS, CALENDAR)",
"execution_count": 7,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "imprecise",
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 8,
"data": {
"text/plain": "array([cftime.DatetimeProlepticGregorian(3439-05-15 06:08:18.173117),\n cftime.DatetimeProlepticGregorian(4494-01-30 17:57:21.564247),\n cftime.DatetimeProlepticGregorian(2903-02-24 05:10:24.012762), ...,\n cftime.DatetimeProlepticGregorian(2984-05-04 23:04:46.247022),\n cftime.DatetimeProlepticGregorian(3444-12-29 16:19:06.306945),\n cftime.DatetimeProlepticGregorian(3315-11-19 03:12:16.589285)],\n dtype=object)"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "exact",
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 9,
"data": {
"text/plain": "array([cftime.DatetimeProlepticGregorian(3439-05-15 06:08:18.173132),\n cftime.DatetimeProlepticGregorian(4494-01-30 17:57:21.564271),\n cftime.DatetimeProlepticGregorian(2903-02-24 05:10:24.012762), ...,\n cftime.DatetimeProlepticGregorian(2984-05-04 23:04:46.247007),\n cftime.DatetimeProlepticGregorian(3444-12-29 16:19:06.306935),\n cftime.DatetimeProlepticGregorian(3315-11-19 03:12:16.589290)],\n dtype=object)"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.7.3",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@jswhit
Copy link

jswhit commented May 11, 2020

Since this uses numpy datetime64, does it only work for the proleptic gregorian calendar?

@spencerkclark
Copy link
Author

Thanks for having a look @jswhit -- I only use numpy.timedelta64 here as a conduit to converting an array of integers to an array of datetime.timedelta objects, and then add those datetime.timedelta objects to the basedate (which could be of any calendar type). For that reason I think it should be calendar-agnostic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment