Created
October 18, 2014 17:34
-
-
Save jswhit/db42cb23d119c1b460fd to your computer and use it in GitHub Desktop.
writing netcdf ipython notebook
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"celltoolbar": "Slideshow", | |
"name": "", | |
"signature": "sha256:4ced4abfca9bc693e4f73811b47cc04d94e0f737b84412e31c5bc850513988ea" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"# Writing netCDF data\n", | |
"\n", | |
"**Important Note**: when running this notebook interactively in a browser, you probably will not be able to execute individual cells out of order without getting an error. Instead, choose \"Run All\" from the Cell menu after you modify a cell." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import netCDF4 # Note: python is case-sensitive!\n", | |
"import numpy as np" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## Opening a file, creating a new Dataset\n", | |
"\n", | |
"Let's create a new, empty netCDF file named 'data/new.nc', opened for writing.\n", | |
"\n", | |
"Be careful, opening a file with 'w' will clobber any existing data (unless `clobber=False` is used, in which case an exception is raised if the file already exists).\n", | |
"\n", | |
"- `mode='r'` is the default.\n", | |
"- `mode='a'` opens an existing file and allows for appending (does not clobber existing data)\n", | |
"- `format` can be one of `NETCDF3_CLASSIC`, `NETCDF3_64BIT`, `NETCDF4_CLASSIC` or `NETCDF4` (default). `NETCDF4_CLASSIC` uses HDF5 for the underlying storage layer (as does `NETCDF4`) but enforces the classic netCDF 3 data model so data can be read with older clients. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"try: ncfile.close() # just to be safe, make sure dataset is not already open.\n", | |
"except: pass\n", | |
"ncfile = netCDF4.Dataset('data/new.nc',mode='w',format='NETCDF4') \n", | |
"print ncfile" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"<type 'netCDF4.Dataset'>\n", | |
"root group (NETCDF4 data model, file format HDF5):\n", | |
" dimensions(sizes): \n", | |
" variables(dimensions): \n", | |
" groups: \n", | |
"\n" | |
] | |
} | |
], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## Creating dimensions\n", | |
"\n", | |
"The **ncfile** object we created is a container for _dimensions_, _variables_, and _attributes_. First, let's create some dimensions using the [`createDimension`](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createDimension) method. \n", | |
"\n", | |
"- Every dimension has a name and a length. \n", | |
"- The name is a string that is used to specify the dimension to be used when creating a variable, and as a key to access the dimension object in the `ncfile.dimensions` dictionary.\n", | |
"\n", | |
"Setting the dimension length to `0` or `None` makes it unlimited, so it can grow. \n", | |
"\n", | |
"- For `NETCDF4` files, any variable's dimension can be unlimited. \n", | |
"- For `NETCDF4_CLASSIC` and `NETCDF3*` files, only one per variable can be unlimited, and it must be the leftmost (fastest varying) dimension." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"lat_dim = ncfile.createDimension('lat', 73) # latitude axis\n", | |
"lon_dim = ncfile.createDimension('lon', 144) # longitude axis\n", | |
"time_dim = ncfile.createDimension('time', None) # unlimited axis (can be appended to).\n", | |
"print lat_dim\n", | |
"print lon_dim\n", | |
"print time_dim\n", | |
"print ncfile.dimensions" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"<type 'netCDF4.Dimension'>: name = 'lat', size = 73\n", | |
"\n", | |
"<type 'netCDF4.Dimension'>: name = 'lon', size = 144\n", | |
"\n", | |
"<type 'netCDF4.Dimension'> (unlimited): name = 'time', size = 0\n", | |
"\n", | |
"OrderedDict([('lat', <type 'netCDF4.Dimension'>: name = 'lat', size = 73\n", | |
"), ('lon', <type 'netCDF4.Dimension'>: name = 'lon', size = 144\n", | |
"), ('time', <type 'netCDF4.Dimension'> (unlimited): name = 'time', size = 0\n", | |
")])\n" | |
] | |
} | |
], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## Creating attributes\n", | |
"\n", | |
"netCDF attributes can be created just like you would for any python object. \n", | |
"- Best to adhere to established conventions (like the [CF](http://cfconventions.org/) conventions)\n", | |
"- We won't try to adhere to any specific convention here though." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ncfile.title='My model data'\n", | |
"print ncfile.title" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"My model data\n" | |
] | |
} | |
], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"Try adding some more attributes..." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## Creating variables\n", | |
"\n", | |
"Now let's add some variables and store some data in them. \n", | |
"\n", | |
"- A variable has a name, a type, a shape, and some data values. \n", | |
"- The shape of a variable is specified by a tuple of dimension names. \n", | |
"- A variable should also have some named attributes, such as 'units', that describe the data.\n", | |
"\n", | |
"The [`createVariable`](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createVariable) method takes 3 mandatory args.\n", | |
"\n", | |
"- the 1st argument is the variable name (a string). This is used as the key to access the variable object from the `variables` dictionary.\n", | |
"- the 2nd argument is the datatype (most numpy datatypes supported). \n", | |
"- the third argument is a tuple containing the dimension names (the dimensions must be created first). Unless this is a `NETCDF4` file, any unlimited dimension must be the leftmost one.\n", | |
"- there are lots of optional arguments (many of which are only relevant when `format='NETCDF'`) to control compression, chunking, fill_value, etc.\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# Define two variables with the same names as dimensions,\n", | |
"# a conventional way to define \"coordinate variables\".\n", | |
"lat = ncfile.createVariable('lat', np.float32, ('lat',))\n", | |
"lat.units = 'degrees_north'\n", | |
"lat.long_name = 'latitude'\n", | |
"lon = ncfile.createVariable('lon', np.float32, ('lon',))\n", | |
"lon.units = 'degrees_east'\n", | |
"lon.long_name = 'longitude'\n", | |
"time = ncfile.createVariable('time', np.float64, ('time',))\n", | |
"time.units = 'hours since 1800-01-01'\n", | |
"time.long_name = 'time'\n", | |
"# Define a 3D variable to hold the data\n", | |
"temp = ncfile.createVariable('temp',np.float64,('time','lat','lon')) # note: unlimited dimension is leftmost\n", | |
"temp.units = 'K' # degrees Kelvin\n", | |
"temp.standard_name = 'air_temperature' # this is a CF standard name\n", | |
"print temp" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"<type 'netCDF4.Variable'>\n", | |
"float64 temp(time, lat, lon)\n", | |
" units: K\n", | |
" standard_name: air_temperature\n", | |
"unlimited dimensions: time\n", | |
"current shape = (0, 73, 144)\n", | |
"filling on, default _FillValue of 9.96920996839e+36 used\n", | |
"\n" | |
] | |
} | |
], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## Pre-defined variable attributes (read only)\n", | |
"\n", | |
"The netCDF4 module provides some useful pre-defined Python attributes for netCDF variables, such as dimensions, shape, dtype, ndim. \n", | |
"\n", | |
"Note: since no data has been written yet, the length of the 'time' dimension is 0." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"print \"-- Some pre-defined attributes for variable temp:\"\n", | |
"print \"temp.dimensions:\", temp.dimensions\n", | |
"print \"temp.shape:\", temp.shape\n", | |
"print \"temp.dtype:\", temp.dtype\n", | |
"print \"temp.ndim:\", temp.ndim" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"-- Some pre-defined attributes for variable temp:\n", | |
"temp.dimensions: (u'time', u'lat', u'lon')\n", | |
"temp.shape: (0, 73, 144)\n", | |
"temp.dtype: float64\n", | |
"temp.ndim: 3\n" | |
] | |
} | |
], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## Writing data\n", | |
"\n", | |
"To write data a netCDF variable object, just treat it like a numpy array and assign values to a slice." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"nlats = len(lat_dim); nlons = len(lon_dim); ntimes = 3\n", | |
"# Write latitudes, longitudes.\n", | |
"# Note: the \":\" is necessary in these \"write\" statements\n", | |
"lat[:] = -90. + (180./nlats)*np.arange(nlats) # south pole to north pole\n", | |
"lon[:] = (180./nlats)*np.arange(nlons) # Greenwich meridian eastward\n", | |
"# create a 3D array of random numbers\n", | |
"data_arr = np.random.uniform(low=280,high=330,size=(ntimes,nlats,nlons))\n", | |
"# Write the data. This writes the whole 3D netCDF variable all at once.\n", | |
"temp[:,:,:] = data_arr # Appends data along unlimited dimension\n", | |
"print \"-- Wrote data, temp.shape is now \", temp.shape\n", | |
"# read data back from variable (by slicing it), print min and max\n", | |
"print \"-- Min/Max values:\", temp[:,:,:].min(), temp[:,:,:].max()" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"-- Wrote data, temp.shape is now (3, 73, 144)\n", | |
"-- Min/Max values: 280.001651325 329.999261968\n" | |
] | |
} | |
], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"- You can just treat a netCDF Variable object like a numpy array and assign values to it.\n", | |
"- Variables automatically grow along unlimited dimensions (unlike numpy arrays)\n", | |
"- The above writes the whole 3D variable all at once, but you can write it a slice at a time instead.\n", | |
"\n", | |
"Let's add another time slice....\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# create a 2D array of random numbers\n", | |
"data_slice = np.random.uniform(low=280,high=330,size=(nlats,nlons))\n", | |
"temp[3,:,:] = data_slice # Appends the 4th time slice\n", | |
"print \"-- Wrote more data, temp.shape is now \", temp.shape" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"-- Wrote more data, temp.shape is now (4, 73, 144)\n" | |
] | |
} | |
], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"Note that we have not yet written any data to the time variable. It automatically grew as we appended data along the time dimension to the variable `temp`, but the data is missing." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"print time\n", | |
"print time[:] # dashes indicate masked values (where data has not yet been written)" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"<type 'netCDF4.Variable'>\n", | |
"float64 time(time)\n", | |
" units: hours since 1800-01-01\n", | |
" long_name: time\n", | |
"unlimited dimensions: time\n", | |
"current shape = (4,)\n", | |
"filling on, default _FillValue of 9.96920996839e+36 used\n", | |
"\n", | |
"[-- -- -- --]\n" | |
] | |
} | |
], | |
"prompt_number": 9 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"Let's add write some data into the time variable. \n", | |
"\n", | |
"- Given a set of datetime instances, use date2num to convert to numeric time values and then write that data to the variable." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"from datetime import datetime\n", | |
"from netCDF4 import date2num,num2date\n", | |
"# 1st 4 days of October.\n", | |
"dates = [datetime(2014,10,1,0),datetime(2014,10,2,0),datetime(2014,10,3,0),datetime(2014,10,4,0)]\n", | |
"print dates\n", | |
"times = date2num(dates, time.units)\n", | |
"print times, time.units # numeric values\n", | |
"time[:] = times\n", | |
"# read time data back, convert to datetime instances, check values.\n", | |
"print num2date(time[:],time.units)" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"[datetime.datetime(2014, 10, 1, 0, 0), datetime.datetime(2014, 10, 2, 0, 0), datetime.datetime(2014, 10, 3, 0, 0), datetime.datetime(2014, 10, 4, 0, 0)]\n", | |
"[ 1882440. 1882464. 1882488. 1882512.] hours since 1800-01-01\n", | |
"[datetime.datetime(2014, 10, 1, 0, 0) datetime.datetime(2014, 10, 2, 0, 0)\n", | |
" datetime.datetime(2014, 10, 3, 0, 0) datetime.datetime(2014, 10, 4, 0, 0)]\n" | |
] | |
} | |
], | |
"prompt_number": 10 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## Closing a netCDF file\n", | |
"\n", | |
"It's **important** to close a netCDF file you opened for writing:\n", | |
"\n", | |
"- flushes buffers to make sure all data gets written\n", | |
"- releases memory resources used by open netCDF files\n", | |
"- lets you start over if you get an error, by recreating file from scratch" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": true, | |
"input": [ | |
"# first print the Dataset object to see what we've got\n", | |
"print ncfile\n", | |
"# close the Dataset.\n", | |
"ncfile.close(); print 'Dataset is closed!'" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"<type 'netCDF4.Dataset'>\n", | |
"root group (NETCDF4 data model, file format HDF5):\n", | |
" title: My model data\n", | |
" dimensions(sizes): lat(73), lon(144), time(4)\n", | |
" variables(dimensions): float32 \u001b[4mlat\u001b[0m(lat), float32 \u001b[4mlon\u001b[0m(lon), float64 \u001b[4mtime\u001b[0m(time), float64 \u001b[4mtemp\u001b[0m(time,lat,lon)\n", | |
" groups: \n", | |
"\n", | |
"Dataset is closed!\n" | |
] | |
} | |
], | |
"prompt_number": 11 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"##Exercise\n", | |
"\n", | |
"Read SREF 24-h forecast precip probability (exercise from **reading_netCDF** notebook) write to a file (with compression).\n", | |
"\n", | |
"- create a new Dataset.\n", | |
"- first create dimensions.\n", | |
"- create and fill coordinate variables to go with dimensions.\n", | |
"- create precip probability variable, write data to it.\n", | |
"- add attributes\n", | |
"- close the Dataset." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"# Advanced features\n", | |
"\n", | |
"So far we've only exercised features associated with the old netCDF version 3 data model. netCDF version 4 adds a lot of new functionality that comes with the more flexible HDF5 storage layer. \n", | |
"\n", | |
"Let's create a new file with `format='NETCDF4'` so we can try out some of these features." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ncfile = netCDF4.Dataset('data/new2.nc','w',format='NETCDF4')\n", | |
"print ncfile" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"<type 'netCDF4.Dataset'>\n", | |
"root group (NETCDF4 data model, file format HDF5):\n", | |
" dimensions(sizes): \n", | |
" variables(dimensions): \n", | |
" groups: \n", | |
"\n" | |
] | |
} | |
], | |
"prompt_number": 12 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## Creating Groups\n", | |
"\n", | |
"netCDF version 4 added support for organizing data in hierarchical groups.\n", | |
"- analagous to directories in a filesystem. \n", | |
"- Groups serve as containers for variables, dimensions and attributes, as well as other groups. \n", | |
"- A `netCDF4.Dataset` defines creates a special group, called the 'root group', which is similar to the root directory in a unix filesystem. \n", | |
"\n", | |
"- groups are created using the [`createGroup`](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createGroup) method.\n", | |
"- takes a single argument (a string, which is the name of the Group instance). This string is used as a key to access the group instances in the `groups` dictionary.\n", | |
"\n", | |
"Here we create two groups to hold data for two different model runs." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"grp1 = ncfile.createGroup('model_run1')\n", | |
"grp2 = ncfile.createGroup('model_run2')\n", | |
"print ncfile\n", | |
"print grp1\n", | |
"print grp2" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"<type 'netCDF4.Dataset'>\n", | |
"root group (NETCDF4 data model, file format HDF5):\n", | |
" dimensions(sizes): \n", | |
" variables(dimensions): \n", | |
" groups: model_run1, model_run2\n", | |
"\n", | |
"<type 'netCDF4.Group'>\n", | |
"group /model_run1:\n", | |
" dimensions(sizes): \n", | |
" variables(dimensions): \n", | |
" groups: \n", | |
"\n", | |
"<type 'netCDF4.Group'>\n", | |
"group /model_run2:\n", | |
" dimensions(sizes): \n", | |
" variables(dimensions): \n", | |
" groups: \n", | |
"\n" | |
] | |
} | |
], | |
"prompt_number": 13 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"Create some dimensions in the root group." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"lat_dim = ncfile.createDimension('lat', 73) # latitude axis\n", | |
"lon_dim = ncfile.createDimension('lon', 144) # longitude axis\n", | |
"time_dim = ncfile.createDimension('time', None) # unlimited axis (can be appended to)." | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [], | |
"prompt_number": 14 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"Now create a variable in grp1 and grp2. The library will search recursively upwards in the group tree to find the dimensions (which in this case are defined one level up).\n", | |
"\n", | |
"- These variables are create with **zlib compression**, another nifty feature of netCDF 4. \n", | |
"- The data are automatically compressed when data is written to the file, and uncompressed when the data is read. \n", | |
"- This can really save disk space, especially when used in conjunction with the [**least_significant_digit**](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createVariable) keyword argument, which causes the data to be quantized (truncated) before compression. This makes the compression lossy, but more efficient." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"temp1 = grp1.createVariable('temp',np.float64,('time','lat','lon'),zlib=True)\n", | |
"temp2 = grp2.createVariable('temp',np.float64,('time','lat','lon'),zlib=True)\n", | |
"print ncfile\n", | |
"print grp1\n", | |
"print grp2" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"<type 'netCDF4.Dataset'>\n", | |
"root group (NETCDF4 data model, file format HDF5):\n", | |
" dimensions(sizes): lat(73), lon(144), time(0)\n", | |
" variables(dimensions): \n", | |
" groups: model_run1, model_run2\n", | |
"\n", | |
"<type 'netCDF4.Group'>\n", | |
"group /model_run1:\n", | |
" dimensions(sizes): \n", | |
" variables(dimensions): float64 \u001b[4mtemp\u001b[0m(time,lat,lon)\n", | |
" groups: \n", | |
"\n", | |
"<type 'netCDF4.Group'>\n", | |
"group /model_run2:\n", | |
" dimensions(sizes): \n", | |
" variables(dimensions): float64 \u001b[4mtemp\u001b[0m(time,lat,lon)\n", | |
" groups: \n", | |
"\n" | |
] | |
} | |
], | |
"prompt_number": 15 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"##Creating a variable with a compound data type\n", | |
"\n", | |
"- Compound data types map directly to numpy structured (a.k.a 'record' arrays). \n", | |
"- Structured arrays are akin to C structs, or derived types in Fortran. \n", | |
"- They allow for the construction of table-like structures composed of combinations of other data types, including other compound types. \n", | |
"- Might be useful for representing multiple parameter values at each point on a grid, or at each time and space location for scattered (point) data. \n", | |
"\n", | |
"Here we create a variable with a compound data type to represent complex data (there is no native complex data type in netCDF). \n", | |
"\n", | |
"- The compound data type is created with the [`createCompoundType`](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createCompoundType) method." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# create complex128 numpy structured data type\n", | |
"complex128 = np.dtype([('real',np.float64),('imag',np.float64)])\n", | |
"# using this numpy dtype, create a netCDF compound data type object\n", | |
"# the string name can be used as a key to access the datatype from the cmptypes dictionary.\n", | |
"complex128_t = ncfile.createCompoundType(complex128,'complex128')\n", | |
"# create a variable with this data type, write some data to it.\n", | |
"cmplxvar = grp1.createVariable('cmplx_var',complex128_t,('time','lat','lon'))\n", | |
"# write some data to this variable\n", | |
"# first create some complex random data\n", | |
"nlats = len(lat_dim); nlons = len(lon_dim)\n", | |
"data_arr_cmplx = np.random.uniform(size=(nlats,nlons))+1.j*np.random.uniform(size=(nlats,nlons))\n", | |
"# write this complex data to a numpy complex128 structured array\n", | |
"data_arr = np.empty((nlats,nlons),complex128)\n", | |
"data_arr['real'] = data_arr_cmplx.real; data_arr['imag'] = data_arr_cmplx.imag\n", | |
"cmplxvar[0] = data_arr # write the data to the variable (appending to time dimension)\n", | |
"print cmplxvar\n", | |
"data_out = cmplxvar[0] # read data back from variable\n", | |
"print data_out.dtype, data_out.shape, data_out[0,0]" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"<type 'netCDF4.Variable'>\n", | |
"compound cmplx_var(time, lat, lon)\n", | |
"compound data type: [('real', '<f8'), ('imag', '<f8')]\n", | |
"path = /model_run1\n", | |
"unlimited dimensions: time\n", | |
"current shape = (1, 73, 144)\n", | |
"\n", | |
"[('real', '<f8'), ('imag', '<f8')] (73, 144) (0.11203931478357165, 0.8210570135673454)\n" | |
] | |
} | |
], | |
"prompt_number": 16 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"##Creating a variable with a variable-length (vlen) data type\n", | |
"\n", | |
"netCDF 4 has support for variable-length or \"ragged\" arrays. These are arrays of variable length sequences having the same type. \n", | |
"\n", | |
"- To create a variable-length data type, use the [`createVLType`](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createVLType) method.\n", | |
"- The numpy datatype of the variable-length sequences and the name of the new datatype must be specified. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"vlen_t = ncfile.createVLType(np.int64, 'phony_vlen')" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [], | |
"prompt_number": 17 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"A new variable can then be created using this datatype." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"vlvar = grp2.createVariable('phony_vlen_var', vlen_t, ('time','lat','lon'))" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [], | |
"prompt_number": 18 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"Since there is no native vlen datatype in numpy, vlen arrays are represented in python as object arrays (arrays of dtype `object`). \n", | |
"\n", | |
"- These are arrays whose elements are Python object pointers, and can contain any type of python object. \n", | |
"- For this application, they must contain 1-D numpy arrays all of the same type but of varying length. \n", | |
"- Fill with 1-D random numpy int64 arrays of random length between 1 and 10." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"vlen_data = np.empty((nlats,nlons),object)\n", | |
"for i in range(nlons):\n", | |
" for j in range(nlats):\n", | |
" size = np.random.randint(1,10,size=1)\n", | |
" vlen_data[j,i] = np.random.randint(0,10,size=size)\n", | |
"vlvar[0] = vlen_data # append along unlimited dimension (time)\n", | |
"print vlvar\n", | |
"print 'data =\\n',vlvar[:]" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"<type 'netCDF4.Variable'>\n", | |
"vlen phony_vlen_var(time, lat, lon)\n", | |
"vlen data type: int64\n", | |
"path = /model_run2\n", | |
"unlimited dimensions: time\n", | |
"current shape = (1, 73, 144)\n", | |
"\n", | |
"data =\n", | |
"[[[array([0, 6, 6, 0, 6, 5, 5, 8]) array([2, 0, 2]) array([5, 1, 4, 1])\n", | |
" ..., array([6, 9, 8, 7, 4, 0, 8, 9]) array([3, 7, 5, 9, 3, 4, 2, 2, 7])\n", | |
" array([4, 1, 1, 0, 4, 3, 5, 5, 8])]\n", | |
" [array([7, 4, 8, 0, 6, 2, 3]) array([9, 4, 0, 3, 5, 2, 0, 0, 2])\n", | |
" array([6, 0]) ..., array([9, 8, 5, 8])\n", | |
" array([0, 5, 6, 1, 1, 5, 1, 8, 3]) array([2, 6, 1, 6, 1, 5, 5, 1])]\n", | |
" [array([4, 6]) array([5, 7, 5, 4, 4]) array([5, 4, 2, 6, 6, 1, 6, 7, 3])\n", | |
" ..., array([1]) array([4, 0, 3, 9, 3, 6])\n", | |
" array([4, 3, 9, 0, 6, 3, 4, 2])]\n", | |
" ..., \n", | |
" [array([1, 4, 4, 0, 2, 0, 5, 9, 1]) array([2, 9, 7, 2, 2])\n", | |
" array([0, 9, 3, 6, 0, 2, 1, 6]) ..., array([0, 9, 4, 5, 8, 0, 1, 7, 7])\n", | |
" array([8, 0, 1]) array([6, 5])]\n", | |
" [array([0, 8, 0, 0, 0, 0]) array([9, 8, 7, 0]) array([8, 9, 7, 4, 7])\n", | |
" ..., array([8, 5, 8, 0, 4]) array([7, 1, 3, 5, 7, 8, 4, 4])\n", | |
" array([4, 7, 4, 2])]\n", | |
" [array([1, 5, 8, 1, 5, 6, 9]) array([5, 7])\n", | |
" array([7, 7, 1, 6, 1, 1, 8, 2]) ..., array([6, 4, 5, 0, 0, 9, 9, 5])\n", | |
" array([8, 5]) array([6, 6, 5])]]]\n" | |
] | |
} | |
], | |
"prompt_number": 19 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"Close the Dataset and examine the contents with ncdump." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ncfile.close()\n", | |
"!ncdump -h data/new2.nc" | |
], | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"netcdf new2 {\r\n", | |
"types:\r\n", | |
" compound complex128 {\r\n", | |
" double real ;\r\n", | |
" double imag ;\r\n", | |
" }; // complex128\r\n", | |
" int64(*) phony_vlen ;\r\n", | |
"dimensions:\r\n", | |
"\tlat = 73 ;\r\n", | |
"\tlon = 144 ;\r\n", | |
"\ttime = UNLIMITED ; // (1 currently)\r\n", | |
"\r\n", | |
"group: model_run1 {\r\n", | |
" variables:\r\n", | |
" \tdouble temp(time, lat, lon) ;\r\n", | |
" \tcomplex128 cmplx_var(time, lat, lon) ;\r\n", | |
" } // group model_run1\r\n", | |
"\r\n", | |
"group: model_run2 {\r\n", | |
" variables:\r\n", | |
" \tdouble temp(time, lat, lon) ;\r\n", | |
" \tphony_vlen phony_vlen_var(time, lat, lon) ;\r\n", | |
" } // group model_run2\r\n", | |
"}\r\n" | |
] | |
} | |
], | |
"prompt_number": 20 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"##Other interesting and useful projects using netcdf4-python\n", | |
"\n", | |
"- [Xray](http://xray.readthedocs.org/en/stable/): N-dimensional variant of the core [pandas](http://pandas.pydata.org) data structure that can operate on netcdf variables.\n", | |
"- [Iris](http://scitools.org.uk/iris/): a data model to create a data abstraction layer which isolates analysis and visualisation code from data format specifics. Uses netcdf4-python to access netcdf data (can also handle GRIB).\n", | |
"- [Biggus](https://github.com/SciTools/biggus): Virtual large arrays (from netcdf variables) with lazy evaluation.\n", | |
"- [cf-python](http://cfpython.bitbucket.org/): Implements the [CF](http://cfconventions.org) data model for the reading, writing and processing of data and metadata. " | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment