Ask HN: We just had an actual UUID v4 collision... Hackernews Viewer

Ask HN: We just had an actual UUID v4 collision...

371 points by mittermayr 8 May 2026 | 293 comments

Comments

This is surprisingly common.

The security of UUIDv4 is based on the assumption of a high-quality entropy source. This assumption is invalidated by hardware defects, normal software bugs, and developers not understanding what "high-quality entropy" actually means and that it is required for UUIDv4 to work as advertised.

It is relatively expensive to detect when an entropy source is broken, so almost no one ever does. They find out when a collision happens, like you just did.

UUIDv4 is explicitly forbidden for a lot of high-assurance and high-reliability software systems for this reason.

throwaway_19sz 8 May 2026

Funny story no one will believe, but it’s true. A good friend of mine joined a startup as CTO 10 years ago, high growth phase, maybe 200 devs… In his first week he discovered the company had a microservice for generating new UUIDs. One endpoint with its own dedicated team of 3 engineers …including a database guy (the plot thickens). Other teams were instructed to call this service every time they needed a new ‘safe’ UUID. My pal asked wtf. It turned out this service had its own DB to store every previously issued UUID. Requests were handled as follows: it would generate a UUID, then ‘validate’ it by checking its own database to ensure the newly generated UUID didn’t match any previously generated UUIDs, then insert it, then return it to the client. Peace of mind I guess. The team had its own kanban board and sprints.

CodesInChaos 18 hours ago

This is usually caused by an insufficently seeded PRNG.

Are you generating the UUID in the backend, or the frontend? Frontend is fundamentally unreliable for many reasons, including deliberate collisions. So if that case you'll need to handle collisions somehow. Though you can still engineer around common sources of collisions, the specifics depend on the environment.

On the other hand making a backend reliable is feasible. What kind of environment is your code running in? Historically VMs sometimes suffered from this problem, though this should be solved nowadays. Heavily sandboxed processes might still run into this, if the RNG library uses an unsafe fallback. Forking processes or VMs can cause state duplication and thus collisions.

_kst_ 17 hours ago

This reminds me of a passage from the book "Pro Git".

<https://git-scm.com/book/en/v2>

"Here’s an example to give you an idea of what it would take to get a SHA-1 collision. If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (6.5 million Git objects) and pushing it into one enormous Git repository, it would take roughly 2 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. Thus, an organic SHA-1 collision is less likely than every member of your programming team being attacked and killed by wolves in unrelated incidents on the same night."

Deliberate collisions are addressed in the following paragraph.

SHA-1 hashes are not random, so the issue of poor pseudo-random number generation doesn't apply as it does to uuidv4. And SHA-1 hashes are 160 bits, vs. 128 for uuidv4.

But I love the idea of unrelated wolf attacks.

e12e 19 hours ago

Some discussion here:

https://github.com/uuidjs/uuid/issues/546

Eg:

> FWIW, I just tested crypto.getRandomValues() behavior on googlebot and it is also deterministic(!)

adyavanapalli 8 May 2026

What you're talking about is so extremely rare that it's much more likely that the entire Earth is destroyed by an asteroid right this inst...

juancn 23 hours ago

Something off on how the RNG is initialized? Lack of entropy?

If the rng is not customized it will use:

    const rnds8 = new Uint8Array(16);
    export default function rng() {
        return crypto.getRandomValues(rnds8);
    }

getRandomValues doesn't specify a minimum amount of entropy.

Geee 8 May 2026

According to the many-worlds interpretation of quantum mechanics, there's bound to be one branch of universe where every UUID is the same. Can you imagine what those guys are thinking?

athrowaway3z 1 hour ago

The rule of thumb is simple:

Consider if your ID can contain a timestamp besides a random value. The answer is usually yes. UUIDv7 is fine.

If you've spend the time to really work through the whole problem and have written down a proof how that leads to unacceptable info leak: Congratulations your system is complex and slow enough that you might as well take a strong cryptographic hash or UUIDv5 if you're lazy.

mittermayr 8 May 2026

I fully agree. It makes no sense. Yet...

The only guesses I'm having is that we originally generated UUIDv4s on a user's phone before sending it to the database, and the UUID generated this morning that collided was created on an Ubuntu server.

I don't fully know how UUIDv4s are generated and what (if anything) about the machine it's being generated on is part of the algorithm, but that's really the only change I can think of, that it used to generated on-device by users, and for many months now, has moved to being generated on server.

dweez 20 hours ago

Good moment to revisit this fun article: https://jasonfantl.com/posts/Universal-Unique-IDs/

If the entire universe were turned into a giant computer and did nothing but generate uuids until its heat death, how many bits would you need for the ID space?

beejiu 19 hours ago

Are your UUIDs generated client side or server side? If it's client side, it could be due to a crawling bot. Googlebot for example executes Javascript using deterministic "randomness".

evnix 2 hours ago

All the probability mathematics aside, the real world we live in is probably a lot less random even with the best hardware random number generators.

I've moved on to something like TSID(where security isn't a factor) or uuidv7 to make sure this never really occurs in practice rather than over engineering the code with retries.

merlindru 22 hours ago

Gotta be a seeding issue. If it's not, and you can prove it, you're about to be a little famous probably :P

leni536 8 May 2026

It's not happening by chance, there is a bug somewhere.

From what I skimmed the package should just call to the js runtime's crypto.randomUUID(). I think it should always be properly seeded.

I think it is extremely unlikely that the runtime has a bug here, but who knows? What js runtime do you use?

jbverschoor 20 hours ago

Most plausible cause: uuid package depends on some random number generator package, which has recently been compromised in order to make “random” numbers predictable. As a result, many crypto (ssl + currency) projects are compromised due to a supplychain attack.

tumdum_ 8 May 2026

Poorly seeded prng.

serf 8 May 2026

1 in 4.72 × 10²⁸

1 in 47.3 octillion.

i'd be suspecting a race condition or some other naive mistake, otherwise id be stocking up on lottery tickets.

(lol at the other user posting at the same time about the lottery ticket.. great minds and all that.)

pif 4 hours ago

All the comments I've been able to read are missing the elephant in the room: no high-quality entropy source can turn a "should" into a "must".

If you want something that is difficult to guess, ask the cryptography guys. But if you need something that is -_guaranteed_ unique, you must build it yourself.

jordiburgos 8 May 2026

Please, do not use b6133fd6-70fe-4fe3-bed6-8ca8fc9386cd, I checked my database and I was using it already.

smokel 17 hours ago

Multiple times have I blamed compilers, cosmic rays, quantum effects, or at the very least an obscure kernel bug, before realizing that I was the source of a bug.

A collision at 15,000 records is so unlikely that I would first suspect something else. Duplicate processing, replayed requests, reused objects, misleading logs, or another code path reusing the identifier.

Could you share a bit more of the surrounding code so we can check?

xyzzy123 8 hours ago

I had dup uuids causing soak test failures in a Linux based distributed system. After long investigation it turned out there was a kernel bug (race condition) that meant two processes on MP system reading from /dev/random at the same could (very rarely, like 1 in a million) get the same bytes when reading the device.

I'd look at rng initialisation first.

sedatk 14 hours ago

> Duplicate UUIDs (Googlebot)

> This module may generate duplicate UUIDs when run in clients with deterministic random number generators, such as Googlebot crawlers. This can cause problems for apps that expect client-generated UUIDs to always be unique. Developers should be prepared for this and have a strategy for dealing with possible collisions, such as:

> - Check for duplicate UUIDs, fail gracefully

> - Disable write operations for Googlebot clients

https://github.com/uuidjs/uuid/commit/91805f665c38b691ac2cbd...

latentframe 6 hours ago

One of the most dangerous words in engineering is “statisticaly impossible” At enough scale edge cases stop to be theoretical and start become production events.

0xfffafaCrash 8 hours ago

Is the uuid generated in the frontend or backend? If frontend, I’d wager the likeliest explanation is that the client code or request was messed with to inject a previously known uuid rather than an entropy issue.

baq 20 hours ago

the vm you're running on virtualized all the entropy away.

8organicbits 16 hours ago

I wrote about real world collisions, including that particular library last year (https://alexsci.com/blog/uuid-oops/).

There are a bunch of constraints that must be strictly held for UUIDs to be collision resistant, I'd guess there is a problem with your random number generator.

nu11ptr 19 hours ago

Ultimately it comes down to your entropy source. I always generate and insert in a loop for this reason, if there is a collision, I therefore handle that gracefully.

sbuttgereit 20 hours ago

> I thought this is technically impossible

No, very technically possible... though, with good randomness, very, very unlikely.

But nothing technically prevents a UUIDv4 from generating a duplicate value.

glaslong 8 May 2026

Buy some lava lamps

radial_symmetry 9 hours ago

Glad to be reading the comments here because I also had this happen to me once and thought I must have been going insane.

mdavid626 19 hours ago

Or there is some other explanation, eg. somebody messed with the request manually, or with the db.

beardyw 8 May 2026

Just a stupid question, but why not append the date, even in seconds as hex. It's just a few bytes and would guarantee that everything OK now will be OK in the future?

sudb 20 hours ago

This is first time I have experienced some vindication that choosing CUID2[1] for one of my projects was actually a good idea.

1. https://github.com/paralleldrive/cuid2

NKosmatos 8 May 2026

> I thought this is technically impossible

Actually it's not impossible, but very very improbable.

P.S. You should play a lottery/powerball ticket

P.P.S. Whenever I use the word improbable, the https://hitchhikers.fandom.com/wiki/Infinite_Improbability_D... comes in mind

rglover 18 hours ago

A check inside the generator function is the best way I've found to avoid this. Wrap uuid or whatever random generator with a check against an ID cache. If it already exists, just run the generator recursively.

zie 15 hours ago

You forgot to use https://www.random.org/ as your source of randomness :)

coldtea 16 hours ago

Were the chances than an npm package is crap factored in?

wg0 8 May 2026

Would the UUID v7 be more collision proof? Hard to say because it takes time into account but then the number of entropy bits are reduced hence the UUID generated exactly at the same time have more chance of a collusion because number of entropy bits are a much smaller space hence could result in collusions more easily.

Thoughts?

nozzlegear 19 hours ago

> I thought this is technically impossible, and it will never happen,

In an eternal universe, even the most unlikely of events will happen an infinite number of times.

sqquima 19 hours ago

Meta, but if I had a question like this, I'd likely have asked on Twitter or Reddit first. I'll keep in mind using HN as an alternative Q&A site.

danfritz 19 hours ago

Always let your db generate uuids. On postgres this is easy since v18 it supports uuid v7!

There is no need to set uuids through javascript or node imo

not_math 8 May 2026

Reminds me of some code I saw running in production. Every time we added a new entry, we were pulling all the UUIDs from this table, generating a new UUID, and checking for collisions up to 10 times.

shortercode 19 hours ago

Fun thing about random is that these things happen. UUIDv7 is less prone to this as it includes both a time component and random. I’ve been using ULID in a few project which has similar attributes to uuidv7 but more space efficient.

BugsJustFindMe 14 hours ago

This is like one of the hardest things for people to understand. Even the best randomness guarantees fuck all. Entropy-based IDs are collision-resistant not collision-proof.

lyfeninja 8 May 2026

Although incredibly rare, it's not impossible so probably best to just plan for collisions. A simply retry should suffice. But I agree I feel like something is going on somewhere else ...

dist-epoch 18 hours ago

It's much more likely that you hit an "impossible bug" due to a bit flip somewhere.

Imagine the database having the old UUID in a memory buffer due to a recent index scan, and a bit flip happened somewhere in the logic which basically copied the old UUID into the memory location of the new UUID, or some buffer addresses got swapped, or the operation which allocated the new UUID received a memory buffer containing the old one, and due to a bit flip the memcpy operation was skipped, or something along that line.

Facebook wrote extensively about this, stuff like "if (false) {do_x(); )" and do_x being called. For example their critical RocksDB kv store has extensive redundant protections to defend against such "impossible bugs".

AndreyK1984 8 May 2026

Why not to have timestamp-uuid instead ?

nhumrich 17 hours ago

> technically impossible

Not at all! Just very unlikely. It's about odds and statistics. Not physics.

OutOfHere 8 May 2026

This is why I prefer to use a random base32 string over UUID. At least you get a proper 128 bit entropy instead of just a 122 bit entropy as with UUIDv4. That's a 64x difference in collision probability. I always thought UUIDs were a toy, not for serious use. If you control the strings, you can even make a longer ID.

Also, numerous applications that use a unique ID per record frequently need to check for ID collisions. I know I do for a short URL generator.

naikrovek 8 May 2026

The chance of a UUIDv4 collision is very low, but it is never zero.

If everything is done properly, then this is very likely the one and only time anyone involved in the telling or reading of this account will ever experience this.

zuzululu 17 hours ago

just uuidv5

QuercusMax 14 hours ago

I lost all confidence in the infallability of software RNG when I was working on an assignment for Data Structures a million years ago (2000?). The assignment was simple: simulate a 2D random walk where you randomly go NSEW, and run 100 cases, collecting stats as to how long it takes to return to the origin.

Super easy assignment, wrote it up probably in C++ (maybe just C?), and ran it on my linux box (probably Debian potato). It finished super quick and gave me an average of like 5.6 steps to return to the origin or something. Cool!

I copied it over to my account on the department's HP-UX machines where I was supposed to run and submit it to my instructor. Compiled fine. And then it... just ran forever. I was doing rand() % 4 or something, and the HP-SUX RNG had crazy bias in its last 2 bits, and it just walked away forever, never returning to the origin. Well crap!

Got an A for my writeup, though!

ares623 8 May 2026

Buy a lottery ticket

kittikitti 17 hours ago

Almost all pseudo-random number generators are absolute garbage. They need you believe they work because the NSA needs backdoors and to foolproof ransomware attacks. This isn't surprising at all to me.

sublinear 8 May 2026

> We're using this: https://www.npmjs.com/package/uuid

Why? There's a built-in for this.

https://nodejs.org/api/crypto.html#cryptorandomuuidoptions

MagicMoonlight 19 hours ago

This is why it’s stupid to assume a randomly generated ID is unique just because it is random.

Lammy 16 hours ago

> I thought this is technically impossible, and it will never happen

I always hated this meme/mindset, because if you dig in to the history of them you'll see that their original purpose was to collide. They were labels to identify messages in Apollo's distributed computing architecture. UID and later UUIDs were a reversible way to mark an intersection point between two dimensions.

Any two nodes in a distributed system would generate the same UID/UUID for the same two inputs, and a recipient of an identified message could reverse the identifier back into the original components. They were designed as labels for ephemeral messages so the two dimensions were time and hardware ID (originally Apollo serial number, later 802.3 hwaddress etc).

I think a lot of the confusion can be traced to the very earliest AEGIS implementation where the Apollo engineers started using “canned” (their term, i.e. static or well-known) UIDs to identify filesystems. Over time the popular usage of UUID fully shifted from ephemeral identifiers where duplicates were intentional toward canned identifiers where duplicates were unwanted and the two dimensions were random-and-also-random.