MongoDB, Ruby and ObjectId collisions
This is a (hopefully) short piece explaining the process of debugging ObjectId collisions in MongoDB, in an environment involving Ruby and containers (GCP Cloud Run Jobs -- basically kubernetes jobs).
MongoDB is a document database: json documents can be stored in it and each document has its own unique ObjectId. (This is a very simplified summary, more info here.)
An ObjectId is a 12 byte number that uniquely identifies a document. This number is generated on the client, before inserting a new document in the db. While the format is standard, each client library has to implement it.
The ObjectId format is as follows:
- 4 bytes timestamp in seconds from unix epoch
- 5 bytes random value generated once per process. This random value is unique to the machine and process.
- 3 bytes counter, initialized to a random value, incremented every time an ObjectId is generated.
I like these IDs, they're sortable-ish across machines, the 5 bytes of random "process id" should be enough to avoid collisions, they're fast to generate and you can get 16M of them per second before they repeat.