Hacker Newsnew | past | comments | ask | show | jobs | submit | stanmancan's commentslogin

I left a more detailed comment on the parent, but it's definitely not impossible!

The scenario in this post is that the first uuid was created one year before the duplicate uuid. That isn’t possible with v7

You're heavily leaning on "collision like this" to relate to the exact time stamps for your statement to be true.

It's equality possible to interpret the "like this" to the collision itself, without a focus on the 1 year distance between the creation dates.

So I guess both views are valid.


The inclusion of a timestamp in v7 makes collisions impossible unless the generating systems think that the time is the same down to the millisecond, which makes the temporal distance quite relevant.

Plenty of systems end up generating multiple UUID's in a single millisecond.

The issue with UUIDv7 is that you also have significantly less entropy since you only have a 62 bits (sometimes less, depending on implementation) of "random" data. So while the time aspect of format lowers the chances of collisions, generating two UUIDv7's in the same millisecond (depending on implementation) have a significantly higher chance of collision than two UUIDv4's.

It's still incredibly unlikely, but it's also incredibly unlikely you generate two matching UUIDv4's, but it does happen.

TLDR; It's possible to generate matching UUIDv7's, don't assume otherwise.


Surely the scenario where he generates the same number of items as he did between 2025 and now, but did it in 1 tick of v7 UUIDs also runs into it?

The scenario being the collision itself, the time period isn’t particularly relevant aside from it occurring much quicker than expected.

It's still possible in most implementations of UUIDv7.

UUIDv7 assigns the first 48 bits for the timestamp in milliseconds. You can generate a lot of UUID's in a millisecond though!

Then you have another 12 bits that you can use as you wish; "rand_a". The spec has a few methods they suggest on how to use these bits including 12 bits of random data, using it for sub-millisecond timestamps, or creating a monotonic counter, but each have their downsides:

- Purely random data means you can still run into collisions and anything within the same millisecond is unordered

- Sub millisecond you can run into collisions; there's nothing stopping you from generating two UUID's with the same 62 bits of rand_b data in the same sub-millisecond timestamp.

- Monotonic counters can overflow before the next tick, then what? Rollover? Once you roll over it's no longer monotonic and you can generate the same random data within the same monotonic cycle. Also; it's only monotonic to the system that's generating the UUID. If you have a distributed system and they each have their own monotonic cycles then you'll be generating UUID's with the same timestamp + monotonic counter, and again, are relying on not generating the same random data.

You can steal some of the 62 bits in rand_b if you want as well; you can use rand_a for sub-millisecond accuracy, and then use a few bits of rand_b for a monotonic counter. There's still a chance of collision here, but it's exceedingly low at the expense of less truly random data at the end.

If you want truly collision free, you'd also need to assign a couple of bits to identify the subsystem generating the UUID so that the monotonic counter is unique to that subsystem. You lose the ordering part of the monotonic counter this way though, but I guess you could argue that in nearly 100% of cases the accuracy of sub-millisecond order in a distributed system is a lie anyways.


I think by the time you're building a system that needs to generate (and persist!) billions of identifiers per millisecond, you're solidly past the point where all your design decisions need to be vetted for whether they make sense on your extremely exotic setup.

But 12 bits is not "billions of identifiers" -- it's 4096. Once you exhaust that counter in the same millisecond, you are still relying on a gamble that your random source will not generate the exact same bit sequence for the previous same counter value. And this thread started out with the OP explaining that random collisions are much more common than we'd like them to be, for various reasons.

We have a dedicated snowflake id generator service that returns batch ids. It's also distributed, each service adds its own instance number to the id. When it overflows it just blocks for the next ms. For our traffic, it's never a bottleneck.

Something I use on my own distributed system (where I wanted 64-bit IDs), is use 32 bits for the time in seconds (with an epoch from 2020, so good until 2088), 8 bits for the device ID and 24 bits for a serial number (resets to 0 every time the seconds increments).

That's generally enough IDs per second for most of my edge nodes, but the central worker nodes need more, so I give them a different split and use 4 bits for the device ID and 28 bits for serial number instead.

If a node overflows its serial number that second, I kind of cheat and increment the seconds field early. Every time this happens, I persist the seconds field to the database, and when the app restarts, it starts its seconds count at the last persisted seconds plus one. If the current time in seconds is greater than the last used seconds, I also update it and reset the serial number. Works remarkably well for smoothing out very occasional spikes in ID generation while still approximately remaining globally sortable.

I also "waste" a bit of the 32-bit time field by considering it to be signed, even though it's not really because I don't expect this system to last long enough to reach times where the MSB gets set. But if I ever change my system, I'll set that bit and everything will stay ordered. I'll probably reset the epoch at that point too.


Unfortunately people are inherently lazy. Curious and driven indivdiuals will excel with the availability of LLM's, but the majority will atrophy.



I'm going to give Apple the benefit of the doubt here until proven otherwise. I can't see them releasing something with a terrible user experience as it would cause a lot of reputational harm.


> I can't see them releasing something with a terrible user experience

I see you haven't upgraded to Tahoe yet!


It's cheap for what you get.

If you just need "a small box to make API calls and do minimal local processing" you an also just buy a RPI for a fraction of the price of the GMKtec G10.

All 3 serve a different purpose; just because you can buy a slower machine for less doesn't mean the price:performance of the M1 Mac Mini changes.


> you an also just buy a RPI for a fraction of the price of the GMKtec G10.

Sadly not really. The Pi 5 8gb canakit starter set, which feels like a more true price since it's including power supply, MicroSD card, and case, is now $210. The pi5 8gb by itself is $135.

A 16gb pi5 kit, to match just the RAM capacity to say nothing of the difference in storage {size, speed, quality} and networking, is then also an eye watering $300


>Sadly not really. The Pi 5 8gb canakit starter set, which feels like a more true price since it's including power supply, MicroSD card, and case, is now $210. The pi5 8gb by itself is $135.

At that point buy a used macbook air m1.


>you an also just buy a RPI for a fraction of the price

lol. you need to look at rpi 5 prices again. they are insane.


> Because the taxpayers bail them out. I could define anything as not being risky if I knew taxpayers would bail it out.

I feel like I must be misunderstanding something here because it sounds like you're saying depositing funds in a bank is considered risky behaviour?


It is supposed to be if the amounts are above $250,000. I have no problem with the first $250k being risk free, that is a policy that is well published and that we all "agree" on. Making arbitrary policy decisions that in some cases depositors should be made whole when risky behavior (such as depositing above the insurance limit) bites them is problematic. Stick to the policy or change the policy don't make one off exceptions because that sets weird expectations.

89% of deposits at SVB were uninsured.


I’m sure you would feel differently if it was your employers money there and they weren’t able to pay your salary.

$250k is not much at all for a business


Businesses can use deposit management services to spread cash among many banks. Bonus points they also are less impacted by the poor business practices of one bank.

Individuals can do this too with investment brokers or wealth management providers.

Alternatively we could just made FDIC coverage unlimited, but then that creates poor risk taking incentives, which is the whole point of not setting the expectation of the a bailout by making exceptions.


> depositing funds in a bank is considered risky behaviour?

of course it is, that's why the bank pays you interest on your deposit. They loan out what you deposit at a higher rate and collect the difference as profit. If that loan defaults then your money is gone because the bank was never able to collect it back. FDIC was invented to insure your deposit up to 250k so you're protected (up to 250k) in case that happens.


No, the bank pays you interest on your deposit to entice you to deposit money there so they can lend it out. There is literally zero risk involved (other than something on the scale of the collapse of the US government, which no one is really considering here) because of the FDIC, and yet interest rates on FDIC protected assets are not 0%.


The vast majority of products with paying customers need better availability than “database went down on Friday and I was AFK until Monday, sorry for the 3 day downtime everyone”


If you're offering a hosted service, I've got bad news for you.

Serverless, managed databases and even multicloud won't save you. You'll still have to be on call.

Don't want to be on call? Design your stuff so it works local first.


Local first stuff can also break, so that's not a foolproof plan.


Did the sisters know where you lived? Curious if the police provided them with an area and the sisters were able to give a proper address?


Why is it so hard to believe that the police can use our devices to backtrack us, as both carriers and police officers have said numerous times?


Occams razor


says that police can locate phones


To be honest I thought OP said they lived in an apartment; a house is a different story.


Yes that’s how jobs work; you go to work, do your job, try not to get fired, and get paid for the work you do.

Is that supposed to be a bad thing?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: