Created
June 12, 2013 19:29
-
-
Save enaeseth/5768348 to your computer and use it in GitHub Desktop.
Convert a MongoDB ObjectID to a valid, semantically similar UUID.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Convert a MongoDB ObjectID to a version-1 UUID. | |
Python 2.7+ required for datetime.timedelta.total_seconds(). | |
ObjectID: | |
- UNIX timestamp (32 bits) | |
- Machine identifier (24 bits) | |
- Process ID (16 bits) | |
- Counter (24 bits) | |
UUID 1: | |
- Timestamp (60 bits) | |
- Clock sequence (14 bits) | |
- Node identifier (48 bits) | |
""" | |
import datetime | |
import uuid | |
from bson import objectid | |
class _UTC(datetime.tzinfo): | |
ZERO = datetime.timedelta(0) | |
def utcoffset(self, dt): | |
return self.ZERO | |
def tzname(self, dt): | |
return 'UTC' | |
def dst(self, dt): | |
return self.ZERO | |
UTC = _UTC() | |
UUID_1_EPOCH = datetime.datetime(1582, 10, 15, tzinfo=UTC) | |
UUID_TICKS_PER_SECOND = 10000000 | |
UUID_VARIANT_1 = 0b1000000000000000 | |
def _unix_time_to_uuid_time(dt): | |
return int((dt - UUID_1_EPOCH).total_seconds() * UUID_TICKS_PER_SECOND) | |
def objectid_to_uuid(oid): | |
oid_time = oid.generation_time.astimezone(UTC) | |
oid_hex = str(oid) | |
machine_pid_hex = oid_hex[8:18] | |
counter = int(oid_hex[18:], 16) | |
timestamp_hex = '1%015x' % (_unix_time_to_uuid_time(oid_time)) | |
clock_hex = '%04x' % (UUID_VARIANT_1 | (counter & 0x3fff)) | |
node_hex = '%012x' % int(machine_pid_hex, 16) | |
converted_uuid = uuid.UUID( | |
'%s-%s-%s-%s-%s' % ( | |
timestamp_hex[-8:], | |
timestamp_hex[4:8], | |
timestamp_hex[:4], | |
clock_hex, | |
node_hex | |
) | |
) | |
assert converted_uuid.variant == uuid.RFC_4122 | |
assert converted_uuid.version == 1 | |
return converted_uuid | |
if __name__ == '__main__': | |
oid = objectid.ObjectId() | |
print oid | |
oid_as_uuid = objectid_to_uuid(oid) | |
print '{%s}' % oid_as_uuid |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Just a quick heads up that there are 3 fairly minor issues with this implementation.
Issue 1
In practice this isn't going to be problem unless you're creating objects with a computer that has its system clock set to greater than 2491 AD, so I think it's reasonable enough to ignore that for now.
Issue 2
The RFC indicates that when node is not a MAC address, that the multicast bit must be set http://www.ietf.org/rfc/rfc4122.txt
Issue 3
This depends on how your MongoDB driver is using the "counter" field within MongoDB. Most implementations simply increment, so in that case you'd need to have inserted more than 16,384 MongoDB documents in a second to run into a collision. However, that does assume that the driver is simply incrementing this number, if it's not you may run into issues sooner.
Solution
There's room to insert an additional 4 bits of the timestamp without any black magic, fits right in the field. It's only omitted as a result of the chosen implementation.
Easy enough, just set the bit.
This is where things get a bit tricky. We need to find space for 10 additional bits in our UUID, we've just consumed 1 additional bit above (Multicast bit).
We still have 7 most significant bits of the node field, so we can place 7 bits there.
MongoDB's timestamp only has a resolution of a second, where as UUID v1 has a resolution of 100 nanoseconds. So there's actually 7 (least significant) bits free in the timestamp, so the remaining 3 bits can be placed there.
Of course, the semantics are somewhat altered if you place those last 3 bits in UUID's time field, so this may not be desirable in all circumstances. However, if you want completely lossless (and therefore reversible) conversion, then this is at least one possible solution.