Skip to content

Instantly share code, notes, and snippets.

@sebastiankade
Last active October 12, 2024 15:42
Show Gist options
  • Save sebastiankade/8851b4f3c7286279360bb8dc8a7b9f8f to your computer and use it in GitHub Desktop.
Save sebastiankade/8851b4f3c7286279360bb8dc8a7b9f8f to your computer and use it in GitHub Desktop.
Stop using UUIDs: The Modern ID Spec

Modern ID Spec

An adaptable, human-friendly, web-safe, unique ID spec for modern applications.

Guiding Principles

  • Short
  • Human friendly
  • URL-safe
  • Developer experience
  • Realistically unique

Why Not UUIDs

Forces you into using the "worst-possible" case ID for all your tables.

  • You only need high-entropy IDs for sensitive data or large-volume tables. (Most or your tables are low volume)
  • Hard to copy paste (not human friendly)
  • Impossible to read out (not human friendly)
  • Bad for debugging (currently still done by humans)
  • Make URLs ugly af
  • Complete overkill of most usecases
  • Saving 16 bytes per row is not worth all of the grossness.
  • Really designed for distributed systems, not web apps.

Why Not Sequential Integers

  • always guessable
  • bad for scaling databases horizontally
  • can't be created client-side
  • can't be used for high-volume tables
  • it's not 2000s anymore

The Modern ID

Traditionally we've chosen between sequential Integers and (G)UUIDs for identifiers.

All credit goes to Stripe for pioneering this ID style, but it hasn't gained enough momentum, so here I am.

Modern IDs are:

  1. In the {pf}_{suffix} format (see below)
  2. Contain a short prefix that maps to a table/entity (yes this is stored in the database, see Pros/Cons)
  3. Forgoes marginal disk-space savings for developer experience, user friendliness, and enhanced product design patterns.
  4. Can still guarantee uniqness for large tables (see below)

Format

{prefix}_{suffix}

  • Prefix: 1-2 lowercase alphabet characters, maps to the table/entity
  • Suffix: 4-32 alphanumeric characters

Format: [a-z]{1,2}\_[a-zA-Z0-9]{4,32}

Example: m_CWZpkWfq or t_2rw2FzZB or u_9d2F

A max suffix length of 32 is chosen because at that point we can switch to using hyphen-less UUIDs for the suffix.

Configuration & Rules

  • We will have a mapping of table -> prefix and suffix length
  • Each table should have a suffix length CHOSEN for it's volume and sensitivity.
  • Low-volume tables can go as low as 4 digit suffixes,
  • High high-volume/sensitive tables should use 32digit suffixes (essentially prefixed UUIDs)
  • High while high-volume/sensitive tables can use 32digit suffixes (essentially UUIDs)
  • Table prefixes should never change.
  • Table suffix size can grow independently in size as the table grows, increasing size gives you a whole new set of IDs since sets never conflict (e.g. [A-Z]{4} !== [A-Z]{5}).
  • Since uniqness is not guaranteed, retries should be handled in either the database/server/client depending on your use-case.
  • When uniquness is required (one-time event firing), use 32 digit suffixes.
  • Easy to confuse characters like "O" should be ommited from suffix generators (see NanoID)

Pros

  • Beautiful URLs out of the box (e.g. https://app.com/m_CWZpkWfq)
  • ID sizes are chosen for each table and hence can be kept as short as possible (human friendly)
  • Looking at an ID, tells you what you are looking at and where to find it (Useful for apps and humans alike.)
  • Web app routing paths no longer need extra path segments (e.g. https://app.com/workspace/w_tuy5 -> https://app.com/w_tuy5 )
  • Web apps can support mobile-style routing by pushing another ID onto the url while maintaining the stack (e.g. https://app.com/w_tuy5 -> https://app.com/w_tuy5/m_rDyI0yXjt)
  • Growing IDs over time gives you a bigger set of possible IDs while keeping them shorter. All possible values of 4 suffix + 5 suffix + 6 suffix. (e.g. t_abcd, t_abcde, t_abcdef)
  • Can still guarentee uniquness with 32 digit suffix (fallback to UUID generators for suffix internally).

Proposed Typescript Utility

// A single Modern ID definition
type Mapping<T> = {
  type: T;
  prefix: string;
  size?: number = 8; // A good middle ground
};

// Type-safe configuration of your tables -> prefixes with optional suffix sizes
export function configure<T extends string>(
  mappings: Mapping<T>[]
): {
  newID: (type: T) => string; // generates a new ID for the given entity type
  isID: (id: ID) => boolean; // checks if the given string is a valid ModernID format
  toType: (id: ID) => T; // extracts the entity type from the ID, (useful for using in app logic)
  toPrefix: (type: T) => string; // extracts the prefix from the entity type
};

// Example usage
type AppEntityType = "workspace" | "message" | "event";

// Type-safe validation that all entities are mapped uniqly
const { newID, isID, toType, toPrefix } = configure<AppEntityType>([
  { prefix: "w", type: "workspace", size: 4 }, // low volume
  { prefix: "m", type: "message", size: 8 }, // mid volume
  { prefix: "e", type: "event", size: 32 }, // high volume, must be uniq always
]);

const userID = newID("user"); // w_tuy5
const messageID = newID("message"); // m_EtjrjVz6
const eventID = newID("event"); // e_QQBjNml7tuR7U8vaJTucC6LkPTsg8bzx

Proposed Postgres Function

Could optionally add this into the database layer for easy ID generation. (not required)

CREATE FUNCTION modern_id(p_prefix TEXT, p_length INT)
RETURNS TEXT AS $$
...
$$ LANGUAGE plpgsql;

-- Example usage
CREATE TABLE workspace ( id TEXT PRIMARY KEY DEFAULT modern_id('w', 4), name TEXT);
CREATE TABLE messages ( id TEXT PRIMARY KEY DEFAULT modern_id('m', 16), name TEXT);
CREATE TABLE events ( id TEXT PRIMARY KEY DEFAULT modern_id('e', 32), name TEXT);

Handling Conflicts

To choose the correct suffix size you can use the great collision calculator by alex7kom: https://alex7kom.github.io/nano-nanoid-cc

Using the above, most of your tables can get away with 8 digit suffixes.

Dream case would be starting them all at 4 digits and then automating the bumping of suffix sizes.

Multiple ways to handle the eventual conflics for low-volume tables, depending on how short you want to keep your IDs.

  1. Database layer – Use UPSERT for all creates and retry with new ID on conflict
  2. Server layer – Catch ID conflicts and retry up to 3 times with a new ID before failing to client.
  3. Client layer – Catch ID conflicts and retry up to 3 times with a new ID before failing.

Regardless, this is a solved problem and not that hard.

Closing Comments

Was debating calling them HumanIDs but that already seems to be a thing. Thoughts?

If this picks up steam will publish typescript packages for this.

Would love to hear your thoughts and feedback on this spec.


If interested, follow my journey as I build something big: https://sebastiankade.substack.com/

@szalapski
Copy link

szalapski commented Aug 22, 2024

Why not omit the underscore?

Also, maybe an alternative version/setting that uses only [0-9], or only [0-9a-z], or only [a-z], in the suffix?

@shadowcat-mst
Copy link

I like the idea but I tend to believe that you often want to still use the UUID type under the hood for pg (which has efficient storage for it, therefore way better indexing and joining behaviour).

Not saying that should be compulsory but I think having pg's UUID type supported would make this more compelling.

The prefix thing is absolutely fantastic at the boundaries but machine repr and human repr don't have to match (duh).

@sebastiankade
Copy link
Author

@shadowcat-mst my understanding with the PG performance gains is mostly a) size since it stores uuid as 16-byte datums b) indexing is improved, mainly because of size and how it's stored.

That being said, in a modern application it's rarely the "lookup by ID" case that is costly, generally it's the filtered queries etc. Same with joins, not the costly part.

Once I let go of the size fear, my postgres tables have never looked better :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment