Skip to content

Instantly share code, notes, and snippets.

@rivermont
Last active October 6, 2018 20:33
Show Gist options
  • Save rivermont/7982af9f1034c5e7a1d004185f40f755 to your computer and use it in GitHub Desktop.
Save rivermont/7982af9f1034c5e7a1d004185f40f755 to your computer and use it in GitHub Desktop.

The Size of TIGER

There is a LOT of TIGER data, most of it still not even glanced at. And each TIGER road comes with a bunch of metadata tags.

way 16543325

Taginfo has the following statistics on common TIGER tags (as of Oct 6 2018):

  • 13,078,000 tiger:cfcc
  • 12,871,000 tiger:county
  • 11,874,000 tiger:reviewed (98% no)
  • 8,021,000 tiger:name_base
  • 6,880,000 tiger:name_type
  • 4,700,000 tiger:tlid
  • 4,700,000 tiger:source
  • 4,020,000 tiger:upload_uuid
  • 4,000,000 tiger:zip_left
  • 3,600,000 tiger:zip_right
  • 3,250,000 tiger:separated (99% no)
  • 1,275,000 tiger:name_direction_prefix
  • 1,127,140 tiger:name_base_1
  • 450,000 tiger:name_direction_suffix
  • 370,000 tiger:name_type_1
  • ~1,020,000 other tags with >20,000 usage

In the OSM XML format, each tag is structured like so:

<tag k="KEY" v="VALUE"/>

where KEY and VALUE are the key/value pair for the tag of course. For example, a simple highway tagged with highway=residential + name=Cole Mill Road + surface=asphalt is:

<way>
  <tag k="highway" v="residential"/>
  <tag k="name" v="Cole Mill Road"/>
  <tag k="surface" v="asphalt"/>
</way>

Excluding all other metadata and node references needed to make an actual way this comes out to 119 bytes, with 34, 34, and 30 bytes for each tag.

Size calculation

Applying this to all of the above TIGER tags, total sizes come out as follows.

  • tiger:cfcc (3-byte value): 380 MB
  • tiger:county (assuming an average value of 12 bytes): 514 MB
  • tiger:reviewed=no: 380 MB
  • tiger:name_base (assume avg. val. of 12 bytes): 345 MB
  • tiger:name_type (2-byte value): 227 MB
  • tiger:tlid (around 200 byte values): 1.13 GB
  • tiger:source (30-byte value): 272 MB
  • tiger:upload_uuid (51-byte value): 334 MB
  • tiger:zip_left (5-byte value): 140 MB
  • tiger:zip_right (5-byte value): 126 MB
  • tiger:separated=no: 104 MB
  • tiger:name_direction_prefix (1-byte value): 56 MB
  • tiger:name_base_1 (assume avg. val. of 13 bytes): 53 MB
  • tiger:name_direction_suffix (assume avg. val. of 2 bytes): 20 MB
  • tiger:name_type_1 (2-byte value): 13 MB
  • ~40 MB other tags

Conclusion

That all adds up to over 4 GB of data, just from extraneous import tags (I'll bet NHD imports are even larger ... oh boy).
With the current planet.osm (uncompressed) sitting at around 960 GB, all these TIGER tags make up a whopping 0.42% of all OpenStreetMap data! Wow such large.

This doesn't really conclude much, but it was a fun experiment. I had expected the number to be much larger, but even the vastness of TIGER doesn't compare to the rest of the world.


Still, most TIGER data is misaligned, low-resolution, incorrectly classified, inconsistent and straight up wrong.
If you'd like to help cut down on TIGER data, I've created a Maproulette challenge for my local area and progress is being made!

TIGER Gore

bad TIGER roads 1

bad TIGER roads 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment