Snowflake Id's - Unique Id's - sort of

Introduction

Over the last few weeks, I have been looking at using Azure Table Storage as the backing store for several small APIs I have worked on. Azure Table Storage does not have a way to generate a unique key value like Identity fields on SQL. So, I have been looking for a suitable replacement. Of course, I could use GUIDs, but I don’t want to use such long strings as they are a real pain to type and view. However, as the APIs will be scaled out over time, I still need a way to ensure each ID is unique in a distributed system.

Ensuring the uniqueness of identifiers across distributed systems is a significant challenge. Twitter, a platform that handles millions of tweets daily, developed an innovative solution known as “Snowflake IDs” to tackle this issue. This blog post delves into the concept of Twitter Snowflake IDs, how they work, and their importance in modern web architectures.

What are Snowflake IDs?

Snowflake IDs are a form of unique identifiers used by Twitter to generate unique IDs for tweets.

The primary goal of Snowflake IDs is to generate unique, time-ordered identifiers at a scale and with a speed that traditional systems like databases cannot match.

How Do Snowflake IDs Work?

A Snowflake ID is a 64-bit integer which is composed of:

Timestamp: A 41-bit timestamp representing the milliseconds since a custom epoch (Twitter’s is 1288834974657 milliseconds past the Unix epoch). This provides 69 years’ worth of milliseconds.
Datacenter ID: A 5-bit identifier for the data centre where the tweet originated.
Worker ID: A 5-bit identifier for the worker machine, ensuring that each machine in a data centre has a unique ID.
Sequence Number: A 12-bit sequence is incremented for every ID generated in the exact millisecond, allowing 4096 unique IDs to be generated per millisecond per worker.

This structure ensures that every ID is unique across different machines, data centres, and times.

Advantages of Snowflake IDs

Uniqueness: By incorporating the machine and data center ID, Snowflake ensures uniqueness across a distributed system.
Time-ordered: The IDs are roughly time-ordered, which is beneficial for storing and retrieving data in time-sequential order.
Highly Scalable: Snowflake can generate millions of unique IDs without the need for a centralized authority, making it highly scalable.
Low Overhead: The generation of IDs is lightweight and does not involve heavy computation or network overhead.

Implementing Snowflake IDs

To implement Snowflake ID generation in your application:

Define Epoch: Choose a custom epoch that suits your application.
Allocate Bits: Decide on the bit allocation for each component (timestamp, datacenter ID, worker ID, sequence number).
Generate IDs: Use synchronized clocks across your servers to generate time-based parts of the ID.

Applications Beyond Twitter

Snowflake IDs have applications far beyond Twitter. They are useful in any distributed system where unique, time-sequential identifiers are necessary. Examples include:

Database Primary Keys: Especially in NoSQL databases where auto-incremented IDs are not available.
Order IDs in E-commerce: Where a unique and time-ordered identifier is required for each transaction.
Distributed Logging Systems: For uniquely identifying log entries from multiple sources.

Challenges and Considerations

Clock Synchronization: Snowflake relies on synchronized system clocks. Clock drift or adjustments can lead to ID collisions or out-of-order IDs.
Bit Limitations: The bit allocation for each component of the ID needs careful planning to avoid running out of space.

Conclusion

Twitter’s Snowflake IDs offer an elegant solution to a complex problem faced in distributed systems – generating unique, time-ordered identifiers at scale.

Its design principles make it an excellent choice for modern web applications that require a robust system for unique ID generation.

Snowflake’s methodology is a testament to the innovative approaches needed to handle the challenges of large-scale, distributed internet architectures.

For now, I will be moving forward using something akin to the Snowflake ID for my new APIs.

Snowflake Id’s – Unique Id’s – sort of