On Generating Unique IDs, Part 2

In Part 1, we talked about how UUID assumptions can crack at scale. The probabilistic guarantees that feel comforting in theory can break down in practice—race conditions in PRNGs, entropy exhaustion, clock skew, container cloning. The math assumes perfect randomness, perfect isolation, perfect timing. None of these exist in real deployments.

So what do you do when “good enough” stops being good enough?

You stop betting on probability and start building for certainty.

The Coordination Reality

Here’s the fundamental truth: if you need strict uniqueness guarantees at scale, you usually need coordination.

You can coordinate up front (assign machine IDs once), coordinate at runtime (distributed locks), or coordinate eventually (collision detection and resolution). But one way or another, you’re coordinating.

The question isn’t whether to coordinate. It’s where to put that coordination cost.

Coordination Systems: etcd and Zookeeper

Before diving into specific ID generation approaches, it’s worth understanding the coordination systems that make them possible.

Apache Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and group services. It presents a simple interface to clients: a hierarchical file system-like API with znodes (data nodes) that can be created, read, updated, and deleted. Zookeeper guarantees that writes are atomic and totally ordered, and it handles leader election and consensus internally.

etcd is a distributed, reliable key-value store written in Go that provides a reliable way to store data across a cluster of machines. It uses the Raft consensus algorithm to achieve strong consistency and provides watch APIs for monitoring changes to keys.

Both systems solve the same fundamental problem: how do multiple machines agree on state in a distributed environment? For ID generation, they’re used to assign unique worker/machine IDs, coordinate leader election, manage configuration and state, and provide distributed locking.

You’re not avoiding coordination. You’re just making it explicit and bounded.

Twitter’s Snowflake

Twitter’s Snowflake, released in 2010, generates 64-bit IDs with the following structure:

0 | 0000000000 0000000000 0000000000 0000000000 0 | 0000000000 | 000000000000
↑   └─────────────────────── 41 bits ─────────────────┘   └─10 bits─┘   └─12 bits─┘
│                   timestamp                           │  machine ID   │  sequence  │
│                      (ms)                            │              │   number   │
└─ unused (sign bit, always 0)

41 bits of timestamp: Milliseconds since a custom epoch. Gives you about 69 years of IDs from the epoch.

10 bits of machine ID: 5 bits for datacenter, 5 bits for worker. Supports 32 datacenters × 32 workers = 1024 unique generators.

12 bits of sequence: Rolling counter that resets every millisecond. Each machine can generate 4096 IDs per millisecond.

This design gives you:

Chronological ordering: IDs roughly sort by creation time (within the same millisecond and machine)
Uniqueness by construction: The machine ID portion guarantees uniqueness across generators—no runtime coordination needed
High throughput: 4096 IDs/ms per machine. With 1024 machines, that’s ~4 million IDs per millisecond globally

The coordination happens once: when a machine starts up, it claims its unique worker ID via Zookeeper. Once the worker ID is assigned, the machine generates IDs independently. No further coordination required.

The trade-offs:

Clock skew between machines can cause ID ordering issues
If a machine’s clock moves backward (NTP adjustment), you’ll generate duplicate IDs until the clock catches up
You need to manage and track machine IDs

Instagram’s Approach

Instagram went in a different direction. Instead of structuring IDs with embedded metadata like Snowflake, they generate 64-bit unsigned integers using a two-step approach:

Pre-allocate ID ranges to each server via a central service
Generate IDs locally from the allocated range

+---------------------+  +----------------------+  +---------------------+
|   Web Server 1      |  |   Web Server 2       |  |   Web Server 3      |
| ID Range: 1-1000    |  | ID Range: 1001-2000  |  | ID Range: 2001-3000 |
|                     |  |                      |  |                     |
| local counter = 1   |  | local counter = 1001 |  | local counter = 2001|
| next ID: 1          |  | next ID: 1001        |  | next ID: 2001       |
+---------------------+  +----------------------+  +---------------------+
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    +------------------------------------------+
                    |   ID Generation Service (PostgreSQL)    |
                    |                                          |
                    |  INSERT INTO id_seq (stub) VALUES ('1') |
                    |  RETURNING id;                          |
                    +------------------------------------------+

Each server grabs a batch of IDs (Instagram used 1000) from a central PostgreSQL table, then uses them locally without further coordination. When it runs low, it grabs another batch.

CREATE TABLE id_seq (
  id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  stub CHAR(1)
);

-- Every time you need more IDs:
INSERT INTO id_seq (stub) VALUES ('1') RETURNING id;

If RETURNING id gives you 50000, you now own IDs 50000-50999. Generate them at will locally. No database roundtrips until you need more.

The trade-offs:

Pros: Simple to implement, works with standard databases, no clock dependency, easy to understand
Cons: IDs aren’t roughly ordered by time, you get gaps if servers crash with unused ranges in their allocation

This is the right answer when you want simplicity and don’t care about time-ordered IDs.

Other Approaches

There are many other ways companies have solved this problem:

Sonyflake — A Snowflake variant that uses 39 bits for timestamp (174 years) and 8 bits for sequence, giving you 256 IDs per 10 milliseconds per machine
Boundary’s Flake — Uses 48-bit timestamp, 16-bit worker ID, and stores it as a base-36 encoded string
CockroachDB — Uses unique_rowid() which combines the insertion timestamp with the replica’s node ID
MongoDB ObjectId — 12-byte identifier with 4-byte timestamp, 5-byte random value, and 3-byte counter
ULID — Universally Unique Lexicographically Sortable Identifier, 128 bits combining timestamp (48 bits) and randomness (80 bits)

Each approach makes different trade-offs between coordination cost, ordering guarantees, and implementation complexity. They’re all valid for different use cases.

Conclusion

UUIDs work for most applications. The collision probability is tiny, and for many systems, “good enough” is actually good enough.

But understand what you’re betting on. UUIDs give you probabilistic uniqueness, not absolute guarantees. The math assumes perfect randomness, perfect isolation, perfect timing—none of which exist in real systems.

At scale, if you need deterministic guarantees, you need deterministic systems. The approaches used by Twitter, Instagram, and others all have one thing in common: they make coordination explicit. No pretending that randomness solves everything. No hiding the complexity.

They coordinate up front, allocate resources, and generate IDs with certainty.

Design for the collapse, not the ideal case. Know where your assumptions break. Have a plan for when the math stops working in your favor.

Because it will.