I get what you’re saying but this feels like a premature optimization that only becomes necessary at scale.
It reminds me a bit of the microservices trend. People tried to mimic big tech companies but the community slowly realized that it’s not necessary for most companies and adds a lot of complexity.
I’ve worked at a variety of companies from small to medium-large and I can’t remember a single instance where we wish we used integer ids. It’s always been the opposite where we have to work around conflicts and auto incrementing.
In the same vein, distributed DBs are not required for most companies (from a technical standpoint; data locality for things like GDPR is another story). You can vertically scale _a lot_ before you even get close to the limits of a modern RDBMS. Like hundreds of thousands of QPS.
I've personally ran MySQL in RDS on a mid-level instance, nowhere near close to maxing out RAM or IOPS, and it handled 120K QPS just fine. Notably, this was with a lot of UUIDv4 PKs.
I'd wager with intelligent schema design, good queries, and careful tuning, you could surpass 1 million QPS on a single instance.
Auto-incrementing integers mean you're always dependent on a central server. UUIDs break that dependency, so you can scale writes up to multiple databases in parallel.
If you're using MySQL maybe integer ids make sense, because it scales differently than PostgreSQL.
If the DB fails to assign an ID, it's probably broken, so having an external ID won't help you.
If you're referring to not having conflicts between distributed nodes, that's a solved problem as well – distribute chunked ranges to each node of N size.
The distributed database needs a coordination system anyway, so it's not an additional point.
> In general you shouldn't need to make a roundtrip to produce an ID.
Did you forget the context over the last week? We're already talking about reserving big chunks to remove the need to make a roundtrip to produce an ID. There would instead be something like one roundtrip per million IDs.
> The distributed database needs a coordination system anyway, so it's not an additional point.
Nope! Distributed databases do not necessarily need a "coordination system" in this sense. Most wide-scale distributed databases actually cannot rely on this kind of coordination.
> Did you forget the context over the last week? We're already talking about reserving big chunks to remove the need to make a roundtrip to produce an ID. There would instead be something like one roundtrip per million IDs.
OK, it's very clear that you're speaking from a context which is a very narrow subset of distributed systems as a whole. That's fine, just please understand your experience isn't broadly representative.
> Nope! Distributed databases do not necessarily need a "coordination system" in this sense. Most wide-scale distributed databases actually cannot rely on this kind of coordination.
I'm assuming a system that tracks nodes and checks for quorum(s), because if you let isolated servers be authoritative then your data integrity goes to hell. If you have that system, you can use it for low-bandwidth coordinated decisions like reserving blocks of ids.
Am I wrong to think that most distributed databases have systems like that?
> OK, it's very clear that you're speaking from a context which is a very narrow subset of distributed systems as a whole. That's fine, just please understand your experience isn't broadly representative.
Sure, but the first thing you said in this conversation was "Whatever is distributing the chunks is still a point of central coordination." which is equally narrow, so I wasn't expecting you to suddenly broaden when I asked why that mattered.
Though if you're running AP then I sure hope you have a reconciliation system, and a good reconciliation system can handle that kind of ID conflict. (Maybe you still want to avoid it to speed that process up but that really gets into the weeds.)
They most assuredly do scale. [0]
Also, Slack is built on MySQL + Vitess [1], the same system behind PlanetScale, which internally uses integer IDs [2].
[0]: https://www.enterprisedb.com/docs/pgd/latest/sequences/#glob...
[1]: https://slack.engineering/scaling-datastores-at-slack-with-v...
[2]: https://github.com/planetscale/discussion/discussions/366