There is a LOT of TIGER data, most of it still not even glanced at. And each TIGER road comes with a bunch of metadata tags.
Taginfo has the following statistics on common TIGER tags (as of Oct 6 2018):
- 13,078,000
tiger:cfcc
- 12,871,000
tiger:county
- 11,874,000
tiger:reviewed
(98%no
) - 8,021,000
tiger:name_base
- 6,880,000
tiger:name_type
- 4,700,000
tiger:tlid
- 4,700,000
tiger:source
- 4,020,000
tiger:upload_uuid
- 4,000,000
tiger:zip_left
- 3,600,000
tiger:zip_right
- 3,250,000
tiger:separated
(99% no) - 1,275,000
tiger:name_direction_prefix
- 1,127,140
tiger:name_base_1
- 450,000
tiger:name_direction_suffix
- 370,000
tiger:name_type_1
- ~1,020,000 other tags with >20,000 usage
In the OSM XML format, each tag is structured like so:
<tag k="KEY" v="VALUE"/>
where KEY
and VALUE
are the key/value pair for the tag of course. For example, a simple highway tagged with highway=residential + name=Cole Mill Road + surface=asphalt
is:
<way>
<tag k="highway" v="residential"/>
<tag k="name" v="Cole Mill Road"/>
<tag k="surface" v="asphalt"/>
</way>
Excluding all other metadata and node references needed to make an actual way this comes out to 119 bytes, with 34, 34, and 30 bytes for each tag.
Applying this to all of the above TIGER tags, total sizes come out as follows.
tiger:cfcc
(3-byte value): 380 MBtiger:county
(assuming an average value of 12 bytes): 514 MBtiger:reviewed=no
: 380 MBtiger:name_base
(assume avg. val. of 12 bytes): 345 MBtiger:name_type
(2-byte value): 227 MBtiger:tlid
(around 200 byte values): 1.13 GBtiger:source
(30-byte value): 272 MBtiger:upload_uuid
(51-byte value): 334 MBtiger:zip_left
(5-byte value): 140 MBtiger:zip_right
(5-byte value): 126 MBtiger:separated=no
: 104 MBtiger:name_direction_prefix
(1-byte value): 56 MBtiger:name_base_1
(assume avg. val. of 13 bytes): 53 MBtiger:name_direction_suffix
(assume avg. val. of 2 bytes): 20 MBtiger:name_type_1
(2-byte value): 13 MB- ~40 MB other tags
That all adds up to over 4 GB of data, just from extraneous import tags (I'll bet NHD imports are even larger ... oh boy).
With the current planet.osm
(uncompressed) sitting at around 960 GB, all these TIGER tags make up a whopping 0.42% of all OpenStreetMap data! Wow such large.
This doesn't really conclude much, but it was a fun experiment. I had expected the number to be much larger, but even the vastness of TIGER doesn't compare to the rest of the world.
Still, most TIGER data is misaligned, low-resolution, incorrectly classified, inconsistent and straight up wrong.
If you'd like to help cut down on TIGER data, I've created a Maproulette challenge for my local area and progress is being made!