vaclavdekanovsky · December 19, 2020 22:09
diff --git a/DataFormat_Parameter.ipynb b/DataFormat_Parameter.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dateformat parameter of the Julia CSV parser\n",
    "Written in [Julia](https://julialang.org/). See [CSV.jl](https://csv.juliadata.org/stable/) and [DataFrames.jl](https://dataframes.juliadata.org/stable/) for more details"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "using CSV, DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All examples are based on string input, which is passed to Julia's CSV reader through `IOBuffer`.\n",
    "\n",
    "Using single string will set the `dateformat` for all the `date` columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>c1</th><th>c2</th><th>c3</th><th>c4</th><th>d1</th></tr><tr><th></th><th>String</th><th>Int64</th><th>String</th><th>Float64</th><th>Date…</th></tr></thead><tbody><p>2 rows × 5 columns</p><tr><th>1</th><td>XY</td><td>2</td><td>c</td><td>1.5</td><td>2020-01-05</td></tr><tr><th>2</th><td>AB</td><td>16</td><td>x</td><td>2.33</td><td>2021-01-05</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccc}\n",
       "\t& c1 & c2 & c3 & c4 & d1\\\\\n",
       "\t\\hline\n",
       "\t& String & Int64 & String & Float64 & Date…\\\\\n",
       "\t\\hline\n",
       "\t1 & XY & 2 & c & 1.5 & 2020-01-05 \\\\\n",
       "\t2 & AB & 16 & x & 2.33 & 2021-01-05 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "2×5 DataFrame\n",
       "│ Row │ c1     │ c2    │ c3     │ c4      │ d1         │\n",
       "│     │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mDates.Date\u001b[39m │\n",
       "├─────┼────────┼───────┼────────┼─────────┼────────────┤\n",
       "│ 1   │ XY     │ 2     │ c      │ 1.5     │ 2020-01-05 │\n",
       "│ 2   │ AB     │ 16    │ x      │ 2.33    │ 2021-01-05 │"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = \"\"\"c1|c2|c3|c4|d1\n",
    "\"XY\"|2|c|1.5|2020-01-05\n",
    "\"AB\"|16|x|2.33|2021-01-05\n",
    "\"\"\"\n",
    "\n",
    "CSV.read(IOBuffer(data), DataFrame; \n",
    "    dateformat=\"yyyy-mm-dd\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you have more than one dateformat, you can try the default parser (by setting `types`=>`Date`), but it will probably guess only some of the types. You also need `using Dates` so that you can specify `Date` type. If the parser fails, it wil return the `missing` value, equivalent `Nan`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "┌ Warning: thread = 1 warning: error parsing Date around row = 2, col = 6: \"01/12/20\n",
      "│ \", error=INVALID: OK | NEWLINE | INVALID_DELIMITER \n",
      "└ @ CSV /home/vaclav/.julia/packages/CSV/la2cd/src/file.jl:606\n",
      "┌ Warning: thread = 1 warning: error parsing Date around row = 3, col = 6: \"15/10/20\n",
      "│ \", error=INVALID: OK | NEWLINE | EOF | INVALID_DELIMITER \n",
      "└ @ CSV /home/vaclav/.julia/packages/CSV/la2cd/src/file.jl:606\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>c1</th><th>c2</th><th>c3</th><th>c4</th><th>d1</th><th>d2</th></tr><tr><th></th><th>String</th><th>Int64</th><th>String</th><th>Float64</th><th>Date</th><th>Date?</th></tr></thead><tbody><p>2 rows × 6 columns</p><tr><th>1</th><td>XY</td><td>2</td><td>c</td><td>1.5</td><td>2020-01-05</td><td><em>missing</em></td></tr><tr><th>2</th><td>AB</td><td>16</td><td>x</td><td>2.33</td><td>2021-01-05</td><td><em>missing</em></td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cccccc}\n",
       "\t& c1 & c2 & c3 & c4 & d1 & d2\\\\\n",
       "\t\\hline\n",
       "\t& String & Int64 & String & Float64 & Date & Date?\\\\\n",
       "\t\\hline\n",
       "\t1 & XY & 2 & c & 1.5 & 2020-01-05 & \\emph{missing} \\\\\n",
       "\t2 & AB & 16 & x & 2.33 & 2021-01-05 & \\emph{missing} \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "2×6 DataFrame\n",
       "│ Row │ c1     │ c2    │ c3     │ c4      │ d1         │ d2      │\n",
       "│     │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mDate\u001b[39m       │ \u001b[90mDate?\u001b[39m   │\n",
       "├─────┼────────┼───────┼────────┼─────────┼────────────┼─────────┤\n",
       "│ 1   │ XY     │ 2     │ c      │ 1.5     │ 2020-01-05 │ \u001b[90mmissing\u001b[39m │\n",
       "│ 2   │ AB     │ 16    │ x      │ 2.33    │ 2021-01-05 │ \u001b[90mmissing\u001b[39m │"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "using Dates\n",
    "\n",
    "data = \"\"\"c1|c2|c3|c4|d1|d2\n",
    "\"XY\"|2|c|1.5|2020-01-05|01/12/20\n",
    "\"AB\"|16|x|2.33|2021-01-05|15/10/20\n",
    "\"\"\"\n",
    "\n",
    "# specify that columns are dates and then specify the dateformat\n",
    "CSV.read(IOBuffer(data), DataFrame; \n",
    "    types=Dict(\"d1\"=>Date, \"d2\"=>Date), \n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Passing a dict you can specify different `dateformat` for each column. You don't have to set the types explicitly, it's obvious that these columns should be considered as `Dates`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>c1</th><th>c2</th><th>c3</th><th>c4</th><th>d1</th><th>d2</th></tr><tr><th></th><th>String</th><th>Int64</th><th>String</th><th>Float64</th><th>Date</th><th>Date</th></tr></thead><tbody><p>2 rows × 6 columns</p><tr><th>1</th><td>XY</td><td>2</td><td>c</td><td>1.5</td><td>2020-01-05</td><td>0020-12-01</td></tr><tr><th>2</th><td>AB</td><td>16</td><td>x</td><td>2.33</td><td>2021-01-05</td><td>0020-10-15</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cccccc}\n",
       "\t& c1 & c2 & c3 & c4 & d1 & d2\\\\\n",
       "\t\\hline\n",
       "\t& String & Int64 & String & Float64 & Date & Date\\\\\n",
       "\t\\hline\n",
       "\t1 & XY & 2 & c & 1.5 & 2020-01-05 & 0020-12-01 \\\\\n",
       "\t2 & AB & 16 & x & 2.33 & 2021-01-05 & 0020-10-15 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "2×6 DataFrame\n",
       "│ Row │ c1     │ c2    │ c3     │ c4      │ d1         │ d2         │\n",
       "│     │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mDate\u001b[39m       │ \u001b[90mDate\u001b[39m       │\n",
       "├─────┼────────┼───────┼────────┼─────────┼────────────┼────────────┤\n",
       "│ 1   │ XY     │ 2     │ c      │ 1.5     │ 2020-01-05 │ 0020-12-01 │\n",
       "│ 2   │ AB     │ 16    │ x      │ 2.33    │ 2021-01-05 │ 0020-10-15 │"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = \"\"\"c1|c2|c3|c4|d1|d2\n",
    "\"XY\"|2|c|1.5|2020-01-05|01/12/20\n",
    "\"AB\"|16|x|2.33|2021-01-05|15/10/20\n",
    "\"\"\"\n",
    "\n",
    "# specify that columns are dates and then specify the dateformat\n",
    "df = CSV.read(IOBuffer(data), DataFrame; \n",
    "    dateformats=Dict(\n",
    "        \"d1\"=>\"yyyy-mm-dd\",\n",
    "        \"d2\"=>\"dd/mm/yy\"\n",
    "    )\n",
    ")\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You see that if the year was specified by value `20` only, the parset set the year really to the year 20. You have to add 2000 years to get the correct value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>c1</th><th>c2</th><th>c3</th><th>c4</th><th>d1</th><th>d2</th></tr><tr><th></th><th>String</th><th>Int64</th><th>String</th><th>Float64</th><th>Date</th><th>Date</th></tr></thead><tbody><p>2 rows × 6 columns</p><tr><th>1</th><td>XY</td><td>2</td><td>c</td><td>1.5</td><td>2020-01-05</td><td>2020-12-01</td></tr><tr><th>2</th><td>AB</td><td>16</td><td>x</td><td>2.33</td><td>2021-01-05</td><td>2020-10-15</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cccccc}\n",
       "\t& c1 & c2 & c3 & c4 & d1 & d2\\\\\n",
       "\t\\hline\n",
       "\t& String & Int64 & String & Float64 & Date & Date\\\\\n",
       "\t\\hline\n",
       "\t1 & XY & 2 & c & 1.5 & 2020-01-05 & 2020-12-01 \\\\\n",
       "\t2 & AB & 16 & x & 2.33 & 2021-01-05 & 2020-10-15 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "2×6 DataFrame\n",
       "│ Row │ c1     │ c2    │ c3     │ c4      │ d1         │ d2         │\n",
       "│     │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mDate\u001b[39m       │ \u001b[90mDate\u001b[39m       │\n",
       "├─────┼────────┼───────┼────────┼─────────┼────────────┼────────────┤\n",
       "│ 1   │ XY     │ 2     │ c      │ 1.5     │ 2020-01-05 │ 2020-12-01 │\n",
       "│ 2   │ AB     │ 16    │ x      │ 2.33    │ 2021-01-05 │ 2020-10-15 │"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# add 2000 years to the column d2 containing 0020-MM-DD\n",
    "# caregul to run this only once, since both df[:, :d2] or df[!, :d2] modifies the column\n",
    "df[!, :d2] += Dates.Year(2000)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Parsing DateTime\n",
    "See [Dates](https://docs.julialang.org/en/v1/stdlib/Dates/#Dates.format-Tuple{TimeType,AbstractString}) module documentation to understand the format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>c1</th><th>c2</th><th>c3</th><th>c4</th><th>d1</th><th>d2</th><th>time</th></tr><tr><th></th><th>String</th><th>Int64</th><th>String</th><th>Float64</th><th>Date</th><th>Date</th><th>DateTime</th></tr></thead><tbody><p>2 rows × 7 columns</p><tr><th>1</th><td>XY</td><td>2</td><td>c</td><td>1.5</td><td>2020-01-05</td><td>0020-12-01</td><td>2020-01-15T10:55:03</td></tr><tr><th>2</th><td>AB</td><td>16</td><td>x</td><td>2.33</td><td>2021-01-05</td><td>0020-10-15</td><td>2020-01-15T23:08:59</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccc}\n",
       "\t& c1 & c2 & c3 & c4 & d1 & d2 & time\\\\\n",
       "\t\\hline\n",
       "\t& String & Int64 & String & Float64 & Date & Date & DateTime\\\\\n",
       "\t\\hline\n",
       "\t1 & XY & 2 & c & 1.5 & 2020-01-05 & 0020-12-01 & 2020-01-15T10:55:03 \\\\\n",
       "\t2 & AB & 16 & x & 2.33 & 2021-01-05 & 0020-10-15 & 2020-01-15T23:08:59 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "2×7 DataFrame. Omitted printing of 1 columns\n",
       "│ Row │ c1     │ c2    │ c3     │ c4      │ d1         │ d2         │\n",
       "│     │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mDate\u001b[39m       │ \u001b[90mDate\u001b[39m       │\n",
       "├─────┼────────┼───────┼────────┼─────────┼────────────┼────────────┤\n",
       "│ 1   │ XY     │ 2     │ c      │ 1.5     │ 2020-01-05 │ 0020-12-01 │\n",
       "│ 2   │ AB     │ 16    │ x      │ 2.33    │ 2021-01-05 │ 0020-10-15 │"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = \"\"\"c1|c2|c3|c4|d1|d2|time\n",
    "\"XY\"|2|c|1.5|2020-01-05|01/12/20|2020Jan15T10:55:03\n",
    "\"AB\"|16|x|2.33|2021-01-05|15/10/20|2020Jan15T23:08:59\n",
    "\"\"\"\n",
    "\n",
    "# specify that columns are dates and then specify the dateformat\n",
    "df = CSV.read(IOBuffer(data), DataFrame; \n",
    "    dateformats=Dict(\n",
    "        \"d1\"=>\"yyyy-mm-dd\",\n",
    "        \"d2\"=>\"dd/mm/yy\",\n",
    "        \"time\"=>DateFormat(\"yyyyuuuddTHH:MM:SS\")\n",
    "    )\n",
    ")\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Details about date parsing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2020-01-01T00:00:00"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DateFormat method allow to specify the language. By default comes only English\n",
    "DateTime(\"2020Jan\", Dates.DateFormat(\"yyyyuuu\", \"english\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Base.KeySet for a Dict{String,Dates.DateLocale} with 1 entry. Keys:\n",
       "  \"english\""
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "keys(Dates.LOCALES)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dict{String,Int64} with 24 entries:\n",
       "  \"Aug\" => 8\n",
       "  \"May\" => 5\n",
       "  \"may\" => 5\n",
       "  \"Jul\" => 7\n",
       "  \"Dec\" => 12\n",
       "  \"Apr\" => 4\n",
       "  \"nov\" => 11\n",
       "  \"jul\" => 7\n",
       "  \"Oct\" => 10\n",
       "  \"apr\" => 4\n",
       "  \"Feb\" => 2\n",
       "  \"feb\" => 2\n",
       "  \"Mar\" => 3\n",
       "  \"oct\" => 10\n",
       "  \"mar\" => 3\n",
       "  \"Sep\" => 9\n",
       "  \"Jun\" => 6\n",
       "  \"dec\" => 12\n",
       "  \"Jan\" => 1\n",
       "  \"aug\" => 8\n",
       "  \"jan\" => 1\n",
       "  \"jun\" => 6\n",
       "  \"Nov\" => 11\n",
       "  \"sep\" => 9"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Dates.LOCALES[\"english\"].month_abbr_value"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.4.1",
   "language": "julia",
   "name": "julia-1.4"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.4.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Dateformat parameter of the Julia CSV parser\n",
	"Written in [Julia](https://julialang.org/). See [CSV.jl](https://csv.juliadata.org/stable/) and [DataFrames.jl](https://dataframes.juliadata.org/stable/) for more details"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"using CSV, DataFrames"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"All examples are based on string input, which is passed to Julia's CSV reader through `IOBuffer`.\n",
	"\n",
	"Using single string will set the `dateformat` for all the `date` columns."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<table class=\"data-frame\"><thead><tr><th></th><th>c1</th><th>c2</th><th>c3</th><th>c4</th><th>d1</th></tr><tr><th></th><th>String</th><th>Int64</th><th>String</th><th>Float64</th><th>Date…</th></tr></thead><tbody><p>2 rows × 5 columns</p><tr><th>1</th><td>XY</td><td>2</td><td>c</td><td>1.5</td><td>2020-01-05</td></tr><tr><th>2</th><td>AB</td><td>16</td><td>x</td><td>2.33</td><td>2021-01-05</td></tr></tbody></table>"
	],
	"text/latex": [
	"\\begin{tabular}{r\|ccccc}\n",
	"\t& c1 & c2 & c3 & c4 & d1\\\\\n",
	"\t\\hline\n",
	"\t& String & Int64 & String & Float64 & Date…\\\\\n",
	"\t\\hline\n",
	"\t1 & XY & 2 & c & 1.5 & 2020-01-05 \\\\\n",
	"\t2 & AB & 16 & x & 2.33 & 2021-01-05 \\\\\n",
	"\\end{tabular}\n"
	],
	"text/plain": [
	"2×5 DataFrame\n",
	"│ Row │ c1 │ c2 │ c3 │ c4 │ d1 │\n",
	"│ │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mDates.Date\u001b[39m │\n",
	"├─────┼────────┼───────┼────────┼─────────┼────────────┤\n",
	"│ 1 │ XY │ 2 │ c │ 1.5 │ 2020-01-05 │\n",
	"│ 2 │ AB │ 16 │ x │ 2.33 │ 2021-01-05 │"
	]
	},
	"execution_count": 2,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"data = \"\"\"c1\|c2\|c3\|c4\|d1\n",
	"\"XY\"\|2\|c\|1.5\|2020-01-05\n",
	"\"AB\"\|16\|x\|2.33\|2021-01-05\n",
	"\"\"\"\n",
	"\n",
	"CSV.read(IOBuffer(data), DataFrame; \n",
	" dateformat=\"yyyy-mm-dd\")"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"If you have more than one dateformat, you can try the default parser (by setting `types`=>`Date`), but it will probably guess only some of the types. You also need `using Dates` so that you can specify `Date` type. If the parser fails, it wil return the `missing` value, equivalent `Nan`."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"┌ Warning: thread = 1 warning: error parsing Date around row = 2, col = 6: \"01/12/20\n",
	"│ \", error=INVALID: OK \| NEWLINE \| INVALID_DELIMITER \n",
	"└ @ CSV /home/vaclav/.julia/packages/CSV/la2cd/src/file.jl:606\n",
	"┌ Warning: thread = 1 warning: error parsing Date around row = 3, col = 6: \"15/10/20\n",
	"│ \", error=INVALID: OK \| NEWLINE \| EOF \| INVALID_DELIMITER \n",
	"└ @ CSV /home/vaclav/.julia/packages/CSV/la2cd/src/file.jl:606\n"
	]
	},
	{
	"data": {
	"text/html": [
	"<table class=\"data-frame\"><thead><tr><th></th><th>c1</th><th>c2</th><th>c3</th><th>c4</th><th>d1</th><th>d2</th></tr><tr><th></th><th>String</th><th>Int64</th><th>String</th><th>Float64</th><th>Date</th><th>Date?</th></tr></thead><tbody><p>2 rows × 6 columns</p><tr><th>1</th><td>XY</td><td>2</td><td>c</td><td>1.5</td><td>2020-01-05</td><td><em>missing</em></td></tr><tr><th>2</th><td>AB</td><td>16</td><td>x</td><td>2.33</td><td>2021-01-05</td><td><em>missing</em></td></tr></tbody></table>"
	],
	"text/latex": [
	"\\begin{tabular}{r\|cccccc}\n",
	"\t& c1 & c2 & c3 & c4 & d1 & d2\\\\\n",
	"\t\\hline\n",
	"\t& String & Int64 & String & Float64 & Date & Date?\\\\\n",
	"\t\\hline\n",
	"\t1 & XY & 2 & c & 1.5 & 2020-01-05 & \\emph{missing} \\\\\n",
	"\t2 & AB & 16 & x & 2.33 & 2021-01-05 & \\emph{missing} \\\\\n",
	"\\end{tabular}\n"
	],
	"text/plain": [
	"2×6 DataFrame\n",
	"│ Row │ c1 │ c2 │ c3 │ c4 │ d1 │ d2 │\n",
	"│ │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mDate\u001b[39m │ \u001b[90mDate?\u001b[39m │\n",
	"├─────┼────────┼───────┼────────┼─────────┼────────────┼─────────┤\n",
	"│ 1 │ XY │ 2 │ c │ 1.5 │ 2020-01-05 │ \u001b[90mmissing\u001b[39m │\n",
	"│ 2 │ AB │ 16 │ x │ 2.33 │ 2021-01-05 │ \u001b[90mmissing\u001b[39m │"
	]
	},
	"execution_count": 3,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"using Dates\n",
	"\n",
	"data = \"\"\"c1\|c2\|c3\|c4\|d1\|d2\n",
	"\"XY\"\|2\|c\|1.5\|2020-01-05\|01/12/20\n",
	"\"AB\"\|16\|x\|2.33\|2021-01-05\|15/10/20\n",
	"\"\"\"\n",
	"\n",
	"# specify that columns are dates and then specify the dateformat\n",
	"CSV.read(IOBuffer(data), DataFrame; \n",
	" types=Dict(\"d1\"=>Date, \"d2\"=>Date), \n",
	")"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Passing a dict you can specify different `dateformat` for each column. You don't have to set the types explicitly, it's obvious that these columns should be considered as `Dates`."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<table class=\"data-frame\"><thead><tr><th></th><th>c1</th><th>c2</th><th>c3</th><th>c4</th><th>d1</th><th>d2</th></tr><tr><th></th><th>String</th><th>Int64</th><th>String</th><th>Float64</th><th>Date</th><th>Date</th></tr></thead><tbody><p>2 rows × 6 columns</p><tr><th>1</th><td>XY</td><td>2</td><td>c</td><td>1.5</td><td>2020-01-05</td><td>0020-12-01</td></tr><tr><th>2</th><td>AB</td><td>16</td><td>x</td><td>2.33</td><td>2021-01-05</td><td>0020-10-15</td></tr></tbody></table>"
	],
	"text/latex": [
	"\\begin{tabular}{r\|cccccc}\n",
	"\t& c1 & c2 & c3 & c4 & d1 & d2\\\\\n",
	"\t\\hline\n",
	"\t& String & Int64 & String & Float64 & Date & Date\\\\\n",
	"\t\\hline\n",
	"\t1 & XY & 2 & c & 1.5 & 2020-01-05 & 0020-12-01 \\\\\n",
	"\t2 & AB & 16 & x & 2.33 & 2021-01-05 & 0020-10-15 \\\\\n",
	"\\end{tabular}\n"
	],
	"text/plain": [
	"2×6 DataFrame\n",
	"│ Row │ c1 │ c2 │ c3 │ c4 │ d1 │ d2 │\n",
	"│ │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mDate\u001b[39m │ \u001b[90mDate\u001b[39m │\n",
	"├─────┼────────┼───────┼────────┼─────────┼────────────┼────────────┤\n",
	"│ 1 │ XY │ 2 │ c │ 1.5 │ 2020-01-05 │ 0020-12-01 │\n",
	"│ 2 │ AB │ 16 │ x │ 2.33 │ 2021-01-05 │ 0020-10-15 │"
	]
	},
	"execution_count": 4,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"data = \"\"\"c1\|c2\|c3\|c4\|d1\|d2\n",
	"\"XY\"\|2\|c\|1.5\|2020-01-05\|01/12/20\n",
	"\"AB\"\|16\|x\|2.33\|2021-01-05\|15/10/20\n",
	"\"\"\"\n",
	"\n",
	"# specify that columns are dates and then specify the dateformat\n",
	"df = CSV.read(IOBuffer(data), DataFrame; \n",
	" dateformats=Dict(\n",
	" \"d1\"=>\"yyyy-mm-dd\",\n",
	" \"d2\"=>\"dd/mm/yy\"\n",
	" )\n",
	")\n",
	"df"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"You see that if the year was specified by value `20` only, the parset set the year really to the year 20. You have to add 2000 years to get the correct value."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<table class=\"data-frame\"><thead><tr><th></th><th>c1</th><th>c2</th><th>c3</th><th>c4</th><th>d1</th><th>d2</th></tr><tr><th></th><th>String</th><th>Int64</th><th>String</th><th>Float64</th><th>Date</th><th>Date</th></tr></thead><tbody><p>2 rows × 6 columns</p><tr><th>1</th><td>XY</td><td>2</td><td>c</td><td>1.5</td><td>2020-01-05</td><td>2020-12-01</td></tr><tr><th>2</th><td>AB</td><td>16</td><td>x</td><td>2.33</td><td>2021-01-05</td><td>2020-10-15</td></tr></tbody></table>"
	],
	"text/latex": [
	"\\begin{tabular}{r\|cccccc}\n",
	"\t& c1 & c2 & c3 & c4 & d1 & d2\\\\\n",
	"\t\\hline\n",
	"\t& String & Int64 & String & Float64 & Date & Date\\\\\n",
	"\t\\hline\n",
	"\t1 & XY & 2 & c & 1.5 & 2020-01-05 & 2020-12-01 \\\\\n",
	"\t2 & AB & 16 & x & 2.33 & 2021-01-05 & 2020-10-15 \\\\\n",
	"\\end{tabular}\n"
	],
	"text/plain": [
	"2×6 DataFrame\n",
	"│ Row │ c1 │ c2 │ c3 │ c4 │ d1 │ d2 │\n",
	"│ │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mDate\u001b[39m │ \u001b[90mDate\u001b[39m │\n",
	"├─────┼────────┼───────┼────────┼─────────┼────────────┼────────────┤\n",
	"│ 1 │ XY │ 2 │ c │ 1.5 │ 2020-01-05 │ 2020-12-01 │\n",
	"│ 2 │ AB │ 16 │ x │ 2.33 │ 2021-01-05 │ 2020-10-15 │"
	]
	},
	"execution_count": 5,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# add 2000 years to the column d2 containing 0020-MM-DD\n",
	"# caregul to run this only once, since both df[:, :d2] or df[!, :d2] modifies the column\n",
	"df[!, :d2] += Dates.Year(2000)\n",
	"df"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Parsing DateTime\n",
	"See [Dates](https://docs.julialang.org/en/v1/stdlib/Dates/#Dates.format-Tuple{TimeType,AbstractString}) module documentation to understand the format."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<table class=\"data-frame\"><thead><tr><th></th><th>c1</th><th>c2</th><th>c3</th><th>c4</th><th>d1</th><th>d2</th><th>time</th></tr><tr><th></th><th>String</th><th>Int64</th><th>String</th><th>Float64</th><th>Date</th><th>Date</th><th>DateTime</th></tr></thead><tbody><p>2 rows × 7 columns</p><tr><th>1</th><td>XY</td><td>2</td><td>c</td><td>1.5</td><td>2020-01-05</td><td>0020-12-01</td><td>2020-01-15T10:55:03</td></tr><tr><th>2</th><td>AB</td><td>16</td><td>x</td><td>2.33</td><td>2021-01-05</td><td>0020-10-15</td><td>2020-01-15T23:08:59</td></tr></tbody></table>"
	],
	"text/latex": [
	"\\begin{tabular}{r\|ccccccc}\n",
	"\t& c1 & c2 & c3 & c4 & d1 & d2 & time\\\\\n",
	"\t\\hline\n",
	"\t& String & Int64 & String & Float64 & Date & Date & DateTime\\\\\n",
	"\t\\hline\n",
	"\t1 & XY & 2 & c & 1.5 & 2020-01-05 & 0020-12-01 & 2020-01-15T10:55:03 \\\\\n",
	"\t2 & AB & 16 & x & 2.33 & 2021-01-05 & 0020-10-15 & 2020-01-15T23:08:59 \\\\\n",
	"\\end{tabular}\n"
	],
	"text/plain": [
	"2×7 DataFrame. Omitted printing of 1 columns\n",
	"│ Row │ c1 │ c2 │ c3 │ c4 │ d1 │ d2 │\n",
	"│ │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mDate\u001b[39m │ \u001b[90mDate\u001b[39m │\n",
	"├─────┼────────┼───────┼────────┼─────────┼────────────┼────────────┤\n",
	"│ 1 │ XY │ 2 │ c │ 1.5 │ 2020-01-05 │ 0020-12-01 │\n",
	"│ 2 │ AB │ 16 │ x │ 2.33 │ 2021-01-05 │ 0020-10-15 │"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"data = \"\"\"c1\|c2\|c3\|c4\|d1\|d2\|time\n",
	"\"XY\"\|2\|c\|1.5\|2020-01-05\|01/12/20\|2020Jan15T10:55:03\n",
	"\"AB\"\|16\|x\|2.33\|2021-01-05\|15/10/20\|2020Jan15T23:08:59\n",
	"\"\"\"\n",
	"\n",
	"# specify that columns are dates and then specify the dateformat\n",
	"df = CSV.read(IOBuffer(data), DataFrame; \n",
	" dateformats=Dict(\n",
	" \"d1\"=>\"yyyy-mm-dd\",\n",
	" \"d2\"=>\"dd/mm/yy\",\n",
	" \"time\"=>DateFormat(\"yyyyuuuddTHH:MM:SS\")\n",
	" )\n",
	")\n",
	"df"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Details about date parsing"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"2020-01-01T00:00:00"
	]
	},
	"execution_count": 10,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# DateFormat method allow to specify the language. By default comes only English\n",
	"DateTime(\"2020Jan\", Dates.DateFormat(\"yyyyuuu\", \"english\"))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"Base.KeySet for a Dict{String,Dates.DateLocale} with 1 entry. Keys:\n",
	" \"english\""
	]
	},
	"execution_count": 11,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"keys(Dates.LOCALES)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"Dict{String,Int64} with 24 entries:\n",
	" \"Aug\" => 8\n",
	" \"May\" => 5\n",
	" \"may\" => 5\n",
	" \"Jul\" => 7\n",
	" \"Dec\" => 12\n",
	" \"Apr\" => 4\n",
	" \"nov\" => 11\n",
	" \"jul\" => 7\n",
	" \"Oct\" => 10\n",
	" \"apr\" => 4\n",
	" \"Feb\" => 2\n",
	" \"feb\" => 2\n",
	" \"Mar\" => 3\n",
	" \"oct\" => 10\n",
	" \"mar\" => 3\n",
	" \"Sep\" => 9\n",
	" \"Jun\" => 6\n",
	" \"dec\" => 12\n",
	" \"Jan\" => 1\n",
	" \"aug\" => 8\n",
	" \"jan\" => 1\n",
	" \"jun\" => 6\n",
	" \"Nov\" => 11\n",
	" \"sep\" => 9"
	]
	},
	"execution_count": 12,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"Dates.LOCALES[\"english\"].month_abbr_value"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Julia 1.4.1",
	"language": "julia",
	"name": "julia-1.4"
	},
	"language_info": {
	"file_extension": ".jl",
	"mimetype": "application/julia",
	"name": "julia",
	"version": "1.4.1"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}