I couldn't find a complete specification for FIDs, but GDAL's vector data model docs say,
The feature id (FID) of a feature is intended to be a unique identifier for the feature within the layer it is a member of. Freestanding features, or features not yet written to a layer may have a null (OGRNullFID) feature id. The feature ids are modeled in OGR as a 64-bit integer; however, this is not sufficiently expressive to model the natural feature ids in some formats. For instance, the GML feature id is a string.
This suggests that all features read from a layer will have a defined FID, and all features will have an FID that is either defined or null. (Though it's not clear that that is guaranteed).
Format drivers may/may not enforce more properties of the FID. For instance, shapefile FIDs start at 0 while geopackage FIDs start at 1. Every time a shapefile changes, the FIDs are re-ordered sequentially starting from 0 with the modified features moved to the end. I haven't researched other formats, but they almost certainly do things differently.
Therefore, if we're working with a GDAL layer read from an unknown vector format, we should not assume much about its FIDs.
Seemingly guaranteed (for data read from an unknown format):
- the function
feature.GetFID
exists - it will return an integer
- the integer will be unique among features in the layer
- the function
layer.GetFeature(fid)
will return the feature with the given FID, if it exists - if that FID doesn't exist, it will raise an error
Not guaranteed (for data read from an unknown format):
- the FID 0 or 1 (or any other number) will exist
- FIDs in the range
[0, layer.GetFeatureCount()]
will exist - FIDs will be consecutive
- FIDs will be preserved if the dataset is copied to another format
- FIDs will be preserved if features are added or deleted
- FIDs will be preserved if other attributes are changed
- etc
FIDs are valuable when used correctly. Just make sure that FIDs are only used as identifiers within a single unmodified layer.
A few models assume that FIDs match across two identical output vectors created in the same format. This seems to hold up, but may not be guaranteed and seems vulnerable to error. For instance, if attributes are changed in one vector, causing the FIDs to re-order, are we silently mixing up data?
Common use cases for FIDs:
- key in the dictionary returned by
pygeoprocessing.zonal_statistics
- geometry ID in an
rtree
spatial index
A different unique key attribute should be created and used for post-processing e.g. in GIS software. Other output data should not reference a vector's FIDs. While it does work, it's prone to user error.