I use this often, but there is one typical scenario where I break this rule - creation commands. Often someone calling a command that creates an entity needs the ID of whatever was created. Yes, you can use UUIDs and pass this to the creation function (or use more complex methods), but I haven't really come across scenarios where this adds anything meaningful, not to mention that it's not always possible. So I generally break CQS for creation commands, i.e. they change state and return a value.
EDIT: I just noticed the Wikipedia article cites other examples, e.g. The `pop()` method on a Stack.
CQS is one of those things that make sense on the first glance, but the more you look at them, the less sense they do. I mean, if you send a (synchronous) command, shouldn't you at least get a success/error indication as a return value? I've met someone who argued against even that, insisting that all commands must be void nothrow, and I should use separate queries to figure out the results. And yes, I should generate UUIDs at every interaction and hope that collisions don't arise, and if somebody tries to do a replay attack, well, that's such an unlikely scenario it can be safely disregarded.
There is a reason that procedure/function split has been largely abandoned in the 80s; but at least procedure could have var/out parameters or have pointers for arguments.
So a command could either return all different kinds of errors, with informative payloads even, or a generic 'ok', instead of arguably more reasonable (since we already return variants with payloads) design of an 'ok' with some useful payload? Or will it literally be a Boolean "success/failure" value?
Admittedly, both are possible designs but I just... don't agree with neither of them. Allowing error states to be returned directly by the command while requiring success states to be queried separately already breaks the CQS; and returning a bool is no more useful than returning void: you'll have to do the same follow-up query in both cases.
At my last job, I occasionally ran into people worrying about SHA-256 collisions for non-maliciously created build artifacts, so I wrote an internal wiki entry with the back-of-the-envelope estimates.
If you're really paranoid, use getentropy() to seed AES in counter mode, and generate 256-bit cryptographically pseudorandom IDs. Assume your system consumes 1 trillion (2^40) IDs per second for 1 trillion seconds (34,000+ years). The probability of a collision over that time frame is roughly (2^(-(256-80)/2)) = 2^-88.
(Actually, in counter mode without throwing out any bits, the probability of collision is even slightly lower than this random oracle model suggests. This is true even if there are multiple independently seeded streams, as long as they're all seeded with high-entropy sources.)
Assume the probability of a life-threatening dinosaur being cloned in your lifetime is one in a billion (2^-30), and if so its escaping is one in one million (2^-20), and if it escapes the chances of it entering your house and you being able to save your life by looking for it is one in one billion (2^-30).
In this case, it's roughly 256 times more rational (assuming your death and the consequences of an ID collision are equally bad) to check under your bed for dinosaurs vs. checking for ID collisions.
Also, the probability of radioactive decay flipping the comparison result bit at the exact moment you compare you random 256-bit IDs is much much higher than the probability of collision. So, if you're paranoid enough to check for collisions, you should be checking multiple times.
Of course, the above analysis all hinges upon correct implementation and high-entropy seeds. These are the real weak points of using large random IDs, so audit and test your code early and often.
Carry out the above analysis with 122 bits of entropy for UUIDv4, substituting your actual system lifetime and expected consumption rate, and you'll likely find similar results.
My first rule is - if you don't need it (SHA, UUID, etc...) - don't use it.
My second rule is - don't be a priest - if someone did it and it works then it works.
Assumption that close to impossible collisions don't happen is a belief, not a proven mathematically fact. ;-) Such a problems are also more complex than just 1 dimensional collision math.
> Assumption that close to impossible collisions don't happen is a belief, not a proven mathematically fact.
I'm not assuming they're impossible. I'm estimating the probability, and rationally prioritizing risks based on probability and severity of impact, balanced against the real-world costs of using gigabytes of source code as a primary key vs a SHA-256 checksum.
I find myself more and more over the year falling back to creating lots of "getOrCreateX(args)" and I must say I still haven't found a single scenario where this is worse than "get" and "create":
1) It helps with encapsculation of race conditions (you don't need to acquire locks outside to do if(get==null) { create } when you can have it in the function itself)
2) I normally don't care in my code if this is a pre-existing instance or one that I just created. I just want it now to use.
The separation often makes sense with long lived data.
A lot of data is fleeting. For example when you construct a struct out of sub-pieces. Many of those probably just live on the stack or are otherwise temporary. This type of data is passed around, read, written to etc. until it has the right shape. It goes through a little pipeline and then often becomes part of a bigger, longer lived structure or at least has some impact on one.
Long lived data is different. It's often global or is at least seen by entire modules and so on. There it makes sense to think of commands and queries. A database is a typical example.
The core idea is to make mutations obvious.
This pattern emerges in different programming domains and has an influence on a lot of programming decisions: UIs, DBs, video games, paradigms, cloud architecture, managed caching, HTTP (safe vs unsafe methods) etc. under different names, so there's an universality to it.
But... I think if we learned anything from the paradigm craze(s) is that any pattern is useful until it isn't.
Long lived data tends to have additional value to it, and anything valuable gets loaded up with business rules.
Data that sticks around and hardly ever changes, we start layering more interpretations on top. Eventually for performance reasons, those functions turn into projections of the data that get pre-cooked instead of interpreted on every request. Materialized view. Data transformations into other tables, etc.
You should generate UUIDs on create requests not for some ideological purity, but for the API to be idempotent. When the network gets unreliable, idempotence + retries gives you effective exactly-once delivery.
All I recalled was Meyer talking about this, I couldn't remember what it was called.
Another technique that is in the same category is functional core, imperative shell, which does largely the same steps but has a few more opinions about how you organize said command and query separation.
Every time I've switched code to this structure it's shrunk my tests by at least half, sometimes much more. It feels so good when you stop.
AKA Command-query Separation (CQS)
https://en.wikipedia.org/wiki/Command%E2%80%93query_separati...
I use this often, but there is one typical scenario where I break this rule - creation commands. Often someone calling a command that creates an entity needs the ID of whatever was created. Yes, you can use UUIDs and pass this to the creation function (or use more complex methods), but I haven't really come across scenarios where this adds anything meaningful, not to mention that it's not always possible. So I generally break CQS for creation commands, i.e. they change state and return a value.
EDIT: I just noticed the Wikipedia article cites other examples, e.g. The `pop()` method on a Stack.