The inclusion of a timestamp in v7 makes collisions impossible unless the generating systems think that the time is the same down to the millisecond, which makes the temporal distance quite relevant.
Plenty of systems end up generating multiple UUID's in a single millisecond.
The issue with UUIDv7 is that you also have significantly less entropy since you only have a 62 bits (sometimes less, depending on implementation) of "random" data. So while the time aspect of format lowers the chances of collisions, generating two UUIDv7's in the same millisecond (depending on implementation) have a significantly higher chance of collision than two UUIDv4's.
It's still incredibly unlikely, but it's also incredibly unlikely you generate two matching UUIDv4's, but it does happen.
TLDR; It's possible to generate matching UUIDv7's, don't assume otherwise.
It's still possible in most implementations of UUIDv7.
UUIDv7 assigns the first 48 bits for the timestamp in milliseconds. You can generate a lot of UUID's in a millisecond though!
Then you have another 12 bits that you can use as you wish; "rand_a". The spec has a few methods they suggest on how to use these bits including 12 bits of random data, using it for sub-millisecond timestamps, or creating a monotonic counter, but each have their downsides:
- Purely random data means you can still run into collisions and anything within the same millisecond is unordered
- Sub millisecond you can run into collisions; there's nothing stopping you from generating two UUID's with the same 62 bits of rand_b data in the same sub-millisecond timestamp.
- Monotonic counters can overflow before the next tick, then what? Rollover? Once you roll over it's no longer monotonic and you can generate the same random data within the same monotonic cycle. Also; it's only monotonic to the system that's generating the UUID. If you have a distributed system and they each have their own monotonic cycles then you'll be generating UUID's with the same timestamp + monotonic counter, and again, are relying on not generating the same random data.
You can steal some of the 62 bits in rand_b if you want as well; you can use rand_a for sub-millisecond accuracy, and then use a few bits of rand_b for a monotonic counter. There's still a chance of collision here, but it's exceedingly low at the expense of less truly random data at the end.
If you want truly collision free, you'd also need to assign a couple of bits to identify the subsystem generating the UUID so that the monotonic counter is unique to that subsystem. You lose the ordering part of the monotonic counter this way though, but I guess you could argue that in nearly 100% of cases the accuracy of sub-millisecond order in a distributed system is a lie anyways.
I think by the time you're building a system that needs to generate (and persist!) billions of identifiers per millisecond, you're solidly past the point where all your design decisions need to be vetted for whether they make sense on your extremely exotic setup.
But 12 bits is not "billions of identifiers" -- it's 4096. Once you exhaust that counter in the same millisecond, you are still relying on a gamble that your random source will not generate the exact same bit sequence for the previous same counter value. And this thread started out with the OP explaining that random collisions are much more common than we'd like them to be, for various reasons.
We have a dedicated snowflake id generator service that returns batch ids. It's also distributed, each service adds its own instance number to the id. When it overflows it just blocks for the next ms. For our traffic, it's never a bottleneck.
Something I use on my own distributed system (where I wanted 64-bit IDs), is use 32 bits for the time in seconds (with an epoch from 2020, so good until 2088), 8 bits for the device ID and 24 bits for a serial number (resets to 0 every time the seconds increments).
That's generally enough IDs per second for most of my edge nodes, but the central worker nodes need more, so I give them a different split and use 4 bits for the device ID and 28 bits for serial number instead.
If a node overflows its serial number that second, I kind of cheat and increment the seconds field early. Every time this happens, I persist the seconds field to the database, and when the app restarts, it starts its seconds count at the last persisted seconds plus one. If the current time in seconds is greater than the last used seconds, I also update it and reset the serial number. Works remarkably well for smoothing out very occasional spikes in ID generation while still approximately remaining globally sortable.
I also "waste" a bit of the 32-bit time field by considering it to be signed, even though it's not really because I don't expect this system to last long enough to reach times where the MSB gets set. But if I ever change my system, I'll set that bit and everything will stay ordered. I'll probably reset the epoch at that point too.
I'm going to give Apple the benefit of the doubt here until proven otherwise. I can't see them releasing something with a terrible user experience as it would cause a lot of reputational harm.
If you just need "a small box to make API calls and do minimal local processing" you an also just buy a RPI for a fraction of the price of the GMKtec G10.
All 3 serve a different purpose; just because you can buy a slower machine for less doesn't mean the price:performance of the M1 Mac Mini changes.
> you an also just buy a RPI for a fraction of the price of the GMKtec G10.
Sadly not really. The Pi 5 8gb canakit starter set, which feels like a more true price since it's including power supply, MicroSD card, and case, is now $210. The pi5 8gb by itself is $135.
A 16gb pi5 kit, to match just the RAM capacity to say nothing of the difference in storage {size, speed, quality} and networking, is then also an eye watering $300
>Sadly not really. The Pi 5 8gb canakit starter set, which feels like a more true price since it's including power supply, MicroSD card, and case, is now $210. The pi5 8gb by itself is $135.
It is supposed to be if the amounts are above $250,000. I have no problem with the first $250k being risk free, that is a policy that is well published and that we all "agree" on. Making arbitrary policy decisions that in some cases depositors should be made whole when risky behavior (such as depositing above the insurance limit) bites them is problematic. Stick to the policy or change the policy don't make one off exceptions because that sets weird expectations.
Businesses can use deposit management services to spread cash among many banks. Bonus points they also are less impacted by the poor business practices of one bank.
Individuals can do this too with investment brokers or wealth management providers.
Alternatively we could just made FDIC coverage unlimited, but then that creates poor risk taking incentives, which is the whole point of not setting the expectation of the a bailout by making exceptions.
> depositing funds in a bank is considered risky behaviour?
of course it is, that's why the bank pays you interest on your deposit. They loan out what you deposit at a higher rate and collect the difference as profit. If that loan defaults then your money is gone because the bank was never able to collect it back. FDIC was invented to insure your deposit up to 250k so you're protected (up to 250k) in case that happens.
No, the bank pays you interest on your deposit to entice you to deposit money there so they can lend it out. There is literally zero risk involved (other than something on the scale of the collapse of the US government, which no one is really considering here) because of the FDIC, and yet interest rates on FDIC protected assets are not 0%.
The vast majority of products with paying customers need better availability than “database went down on Friday and I was AFK until Monday, sorry for the 3 day downtime everyone”
reply