I use this often, but there is one typical scenario where I break this rule - creation commands. Often someone calling a command that creates an entity needs the ID of whatever was created. Yes, you can use UUIDs and pass this to the creation function (or use more complex methods), but I haven't really come across scenarios where this adds anything meaningful, not to mention that it's not always possible. So I generally break CQS for creation commands, i.e. they change state and return a value.
EDIT: I just noticed the Wikipedia article cites other examples, e.g. The `pop()` method on a Stack.
CQS is one of those things that make sense on the first glance, but the more you look at them, the less sense they do. I mean, if you send a (synchronous) command, shouldn't you at least get a success/error indication as a return value? I've met someone who argued against even that, insisting that all commands must be void nothrow, and I should use separate queries to figure out the results. And yes, I should generate UUIDs at every interaction and hope that collisions don't arise, and if somebody tries to do a replay attack, well, that's such an unlikely scenario it can be safely disregarded.
There is a reason that procedure/function split has been largely abandoned in the 80s; but at least procedure could have var/out parameters or have pointers for arguments.
So a command could either return all different kinds of errors, with informative payloads even, or a generic 'ok', instead of arguably more reasonable (since we already return variants with payloads) design of an 'ok' with some useful payload? Or will it literally be a Boolean "success/failure" value?
Admittedly, both are possible designs but I just... don't agree with neither of them. Allowing error states to be returned directly by the command while requiring success states to be queried separately already breaks the CQS; and returning a bool is no more useful than returning void: you'll have to do the same follow-up query in both cases.
At my last job, I occasionally ran into people worrying about SHA-256 collisions for non-maliciously created build artifacts, so I wrote an internal wiki entry with the back-of-the-envelope estimates.
If you're really paranoid, use getentropy() to seed AES in counter mode, and generate 256-bit cryptographically pseudorandom IDs. Assume your system consumes 1 trillion (2^40) IDs per second for 1 trillion seconds (34,000+ years). The probability of a collision over that time frame is roughly (2^(-(256-80)/2)) = 2^-88.
(Actually, in counter mode without throwing out any bits, the probability of collision is even slightly lower than this random oracle model suggests. This is true even if there are multiple independently seeded streams, as long as they're all seeded with high-entropy sources.)
Assume the probability of a life-threatening dinosaur being cloned in your lifetime is one in a billion (2^-30), and if so its escaping is one in one million (2^-20), and if it escapes the chances of it entering your house and you being able to save your life by looking for it is one in one billion (2^-30).
In this case, it's roughly 256 times more rational (assuming your death and the consequences of an ID collision are equally bad) to check under your bed for dinosaurs vs. checking for ID collisions.
Also, the probability of radioactive decay flipping the comparison result bit at the exact moment you compare you random 256-bit IDs is much much higher than the probability of collision. So, if you're paranoid enough to check for collisions, you should be checking multiple times.
Of course, the above analysis all hinges upon correct implementation and high-entropy seeds. These are the real weak points of using large random IDs, so audit and test your code early and often.
Carry out the above analysis with 122 bits of entropy for UUIDv4, substituting your actual system lifetime and expected consumption rate, and you'll likely find similar results.
My first rule is - if you don't need it (SHA, UUID, etc...) - don't use it.
My second rule is - don't be a priest - if someone did it and it works then it works.
Assumption that close to impossible collisions don't happen is a belief, not a proven mathematically fact. ;-) Such a problems are also more complex than just 1 dimensional collision math.
> Assumption that close to impossible collisions don't happen is a belief, not a proven mathematically fact.
I'm not assuming they're impossible. I'm estimating the probability, and rationally prioritizing risks based on probability and severity of impact, balanced against the real-world costs of using gigabytes of source code as a primary key vs a SHA-256 checksum.
I find myself more and more over the year falling back to creating lots of "getOrCreateX(args)" and I must say I still haven't found a single scenario where this is worse than "get" and "create":
1) It helps with encapsculation of race conditions (you don't need to acquire locks outside to do if(get==null) { create } when you can have it in the function itself)
2) I normally don't care in my code if this is a pre-existing instance or one that I just created. I just want it now to use.
The separation often makes sense with long lived data.
A lot of data is fleeting. For example when you construct a struct out of sub-pieces. Many of those probably just live on the stack or are otherwise temporary. This type of data is passed around, read, written to etc. until it has the right shape. It goes through a little pipeline and then often becomes part of a bigger, longer lived structure or at least has some impact on one.
Long lived data is different. It's often global or is at least seen by entire modules and so on. There it makes sense to think of commands and queries. A database is a typical example.
The core idea is to make mutations obvious.
This pattern emerges in different programming domains and has an influence on a lot of programming decisions: UIs, DBs, video games, paradigms, cloud architecture, managed caching, HTTP (safe vs unsafe methods) etc. under different names, so there's an universality to it.
But... I think if we learned anything from the paradigm craze(s) is that any pattern is useful until it isn't.
Long lived data tends to have additional value to it, and anything valuable gets loaded up with business rules.
Data that sticks around and hardly ever changes, we start layering more interpretations on top. Eventually for performance reasons, those functions turn into projections of the data that get pre-cooked instead of interpreted on every request. Materialized view. Data transformations into other tables, etc.
You should generate UUIDs on create requests not for some ideological purity, but for the API to be idempotent. When the network gets unreliable, idempotence + retries gives you effective exactly-once delivery.
All I recalled was Meyer talking about this, I couldn't remember what it was called.
Another technique that is in the same category is functional core, imperative shell, which does largely the same steps but has a few more opinions about how you organize said command and query separation.
Every time I've switched code to this structure it's shrunk my tests by at least half, sometimes much more. It feels so good when you stop.
I have to disagree with the not using the source code formatters bit though.
More trivially, you can improve the source code formatters to accept the example provided as legitimate. In fact, it can be setup to automatically format other similar patterns that way itself.
Yes humans are good at patterns but there’s only so many patterns you can form while writing code, and the heuristics involved while deciding what pattern to use can almost certainly be encoded fairly easily.
But even if we assume that this isn’t possible, a formatted is still very valuable for the simple reason that others may not follow the same patterns as you, or may not follow patterns. Someone else formatting the code differently may rub you the wrong way, and the only way to resolve that is the worst type of code formatting bike-shedding.
And if you don’t agree and both of you do your own thing in different parts of the codebase, well, you’re not following a pattern and you’ve lost the benefits indicated about following patterns.
One can distinguish between patterns that should be forced (e.g. brace placement) and patterns that should remain up to the programmer (e.g. whether a particular “if” statement may be single-line). Meaning, using a formatter can still allow the same code to be formatted in different ways. (Where to insert blank lines for visual structuring is maybe the most obvious example. You wouldn’t want to automate that.) It’s thus a spectrum rather than an all-out black and white.
> Where to insert blank lines for visual structuring is maybe the most obvious example. You wouldn’t want to automate that
In general, I agree, but I actually had some success doing that in a couple cases with custom linter rules.
Some tests were becoming unreadable because of lack of separation between sections, so I started enforcing. We also enforced at least two more spaces in between setup/execute/verify sections, where expectations could only show up in the third section.
This can be annoying if you use a different style, but all our tests followed the same patterns, so it made sense to enforce.
Formatters can also readily spot and preserve patterns if you want them. You can shape the cost function to value patterns.
I suspect Walter formed this opinion when they were much worse.
Also formatters have enormous benefit when you have lots of people in the same project. It almost doesn't even matter what the format is as long as it's efficient IMO.
Exactly. Walter Bright also used to be a lone hero - he isn't anymore. But some habits stick of course. Also the languages he writes in might not have the best formatters.
I am old enough to have experienced the not so good formatters (in mainstream languages) and wasn't a fan. But after working with really great formatters, its such a benefit I don't like to go without it anymore.
The quality of the formatter is extremely important. Using a bad formatter is such a bad experience that I would agree it may be better to not use it... but modern languages have this problem mostly solved (not so sure about D)... the example in the post would still be broken up in multiple lines because almost all formatters will avoid having multiple "statements" in a single line, for good reason.
But as with any tool: you can probably improve it, for example, to keep this exact "pattern" if you really like it... I wonder if there are formatters that are powerful enough to allow this sort of customization.
I can see his point. I'm also perfectly fine with programming without a linter/formatter for personal projects or bigger projects where quality is more important than speed. The code doesn't look worse. In fact it might end up better.
But for corporate teams, doing web dev? It is definitely necessary, unless you want to spend days reviewing trivialities, or you're ok with the code looking like a patchwork quilt of styles.
The point of formatters is exactly to maximize the occurrence of patterns. And the added value compared to humans is that you know for sure there will be no mistake (or at least much much less), and so this is something you can rely on.
The main complaint is that formatters do not (alwaysà follow your favorite pattern. But I much prefer to learn the many patterns that the formatter gives to me for free, than to have to fight to have mine adopted.
I am finding myself often fighting an overly naive code formatter that is "opinionated" and has not much of a way of configuring it, except for maybe line length.
It is so naive, that I cannot make it keep long log lines but break other long line. It has no way of distinguishing those. So I might have 5 lines of log over 2 lines of code, just because of that. So logs get more screen real estate than the actual code and that is so silly.
Then it will undo my own line breaking of procedure calls, if the call could theoretically fit onto the max line length. Only when I add a trailing comma after the last argument I can prevent it from doing its silly changes, that only serve to reduce readability.
It will also complain about long comment lines. So when I have a long line starting with "# TODO:" I would like to keep in 1 line, because when I search for TODO, I will see all of it in the search result.
Most formatters are annoyingly naive and format code so that it becomes less readable than what a capable developer, who cares about readability will output.
Code formatters collapse a variety of different patterns, which could be intentional emphasizations of various aspects of the code, as is the case in the example shown in the article, into a single pattern, which eliminates that information. You might call this a maximization of the pattern that the formatter happens to produce, but as a result, the pattern becomes useless, because it communicates nothing.
The same applies equally to any human programmer that formats their code as consistently as a formatter would.
> Someone else formatting the code differently may rub you the wrong way, and the only way to resolve that is the worst type of code formatting bike-shedding.
True. Your team would just have to select a manager who decides.
I have to disagree with the enum advice. "enum { No, Yes };" is bad (regardless of order) but the solution isn't use bools. The solution is to use a better name "enum { DoNotFoobar, WithFoobar };" is what you should be using. (C++ has scoped enums so you might see Foobar::with or Foobar::without - possibly better if you have that type of option in your language). If you do it this way your code is even more readable, and you can someday realize there is another option as well.
I agree, this is absolutely true. The first time I was introduced to this was actually an excellent old Qt Quarterly, wherein they discussed good API design. Thankfully, it seems most of that information is still online.
It's undoubtedly easy to forget how great Qt was nowadays, after its changed hands many times and honestly lost it's lustre. If anyone remembers what GTK+2 used to look like, it was really funny comparing Qt APIs to GTK2 and Win32. (The GTK developers definitely improved GTK+2 throughout its long life though, so nearer the end GTK+2 was a fair bit less awful to use IMO.) Either way, in my opinion, they were quite ahead of the curve in terms of API design, and I found their API design guides to be immensely helpful. To this day, I also agree with their principles on e.g. naming getters.
Good point: named arguments/keyword parameters actually do negate some of the need for this. Qt is obviously designed for the limitations of C++, so this definitely makes sense. (And yes, CaseInsensitive is not necessarily ideal either. That's a fair point. But, given the limitations, I definitely see why they picked that route.)
However, I will say one thing: there is another good reason to consider using enumerations in place of booleans, and that's when you have a situation where you are not 100% absolutely sure that 1-bit of information is sufficient. For example...
A bit contrived but, let's say you have a method that can optionally perform some kind of validation, like let's say, in C syntax so I don't embarrass myself, you have something like this:
void process_request(request req, bool validate);
This works well, although it obviously fails to the boolean parameter trap. But in D, you can avoid the boolean parameter trap, so that's not too big of a deal. However... I think that in many cases, boolean values are used in cases where you just happen to have something that only has two options, and not necessarily something that is true by nature. For example, maybe you want a validation mode where you only log a message but do not halt processing upon validating. Something like this could work:
But there's a lot of ways you could go about trying to name allow_invalid, it's arguably a pretty bad name. Worse, it only really makes sense when validate = true. In my opinion, something like this would be better:
For case-sensitivity, maybe it's true that this is not necessarily important. However, for regex options in particular, I think what you MIGHT want is more like a set of flags rather than either a pure enumeration OR a pure boolean. Of course, at that point, it does beg the question if maybe regex options should go in their own options structure of some kind, at which point boolean would make more sense again, but at least for shorthand, having a simple flags option seems reasonable to me. (I haven't checked how Qt does it currently. I would not be surprised if they have a longer form "options" overload, too.)
But the enum is equally clear and less typing. If you want named parameters just so can reorder arguments the enums can all have different types and thus the language can reorder (if that is a good idea is an open question)
I mostly agree with this, nobody likes `fobricate(true, false, true)`. I do have to wonder if this is a cure for the lack of named function arguments in C++.
Even in languages with named function arguments you still end up with the same problem very often. The person who wrote fobricate(True, False, True) in python knows what order things are and probably didn't bother to name each one. You could force naming all arguments, but that gets ridiculous as well in a different way.
Because negation is a cognitive problem people have. The other problem happens when a third state is added - it becomes very hard to find all the dependencies that need updating.
This is as silly as saying "down with the integers." (Boolean is a full-blown type that is richer than an enum in terms of the set of operations it supports.)
Down with integers and floats as well! Seriously, replace them all with strong types. Just because you have two integers doesn't mean can mix them, and even if you can that doesn't mean all operations make sense. If you have a matrix you don't want to mix up row and columns accidentally (there are times to do it intentionally, but it should be clear that is intentional). A meter * a meter should result in a liter, while a meter * page count should fail to compile.
If you really want the full set of operations that a bool supports on your enum you can add them (every language I'm aware of makes this harder than it should be, but that is a different issue), but most of the time they don't make logical sense as operations for what I'm doing so I want my code to not compile if I do them - I just make a mistake.
[insert rant about different strings not being the same thing here]
We do need integers at the bottom of the stack, but we need to move up to strong non-fundamental types sooner. It makes things harder at first, but it saves a lot of bugs long term.
See, with strong types when someone catches errors like that it is fixed everywhere, or at least fails to compile where the bugs are, while with weak types you won't find every other place I made the same mistake without an expensive audit.
> Believe it or not, this was common C practice back in the 1980s.
Was it, though? I used C back then (granted, learning it, and remember reading posts from Walter on what I think was FidoNet? BBS based, regardless), and never once saw anything like this OTHER than the stories of the bourne shell source code.
Huge strawman and not at all common outside of the "we invert the norms up to 11" crowd.
The article opens with excerpts from the authors entry into The International Obfuscated C Code Contest which indicates the kind of crowd that treated such hijinks as commonplace.
You didn't see such things in dull boring let's just make simple code that works projects.
You'd see all that and more in the end of town that wanted to sketch out new language ideas ... the first "working" C++ implementations were made by swapping out the default C preprocessor for a more powerful text manipulation engine and really going to town on the macro magic.
It looked like early C++ but it went through a text mangling and came out as C and went into C compilers of the day.
ADDENDUM: I'm pretty sure the first C++ mockup I saw | worked with was the FrankenChild of source.txt | M4 preprocessor | C compiler | Asm | link
<nod> thanks, yeah I was actually a student at UCF where some OCCC winners were instructors (David Van Brackle and Mark Schnitzius, if memory serves).
And, I actually used `cfront` in my first job, and used the parameter to see the C it spit out as a way to learn how C++ did some things. Pretty interesting, if, as you say, nearly unreadable to humans!
In the day it was controversial. I saw it in several magazine "type this code in to get this great program" articles back in the day, but since I was on an 8bit computer without a C compiler I never was able to type them in. There were several letters to the editor saying don't do that, but the magazine always did it anyway. (in about a year the magazine realized the 8bit and 32 bit computers had nothing in common so they split into two different magazines focused on their different niches - I don't know if if this controversy was ever resolved before magazines went defunct)
The one other example I know that morphs the language to that extent and to the detriment of readability by C programmers is the J interpreter[1,2]. But, once again, nobody (that I’ve read) claims it’s good or clear C. (Good C for those who speak J, maybe; I wouldn’t know.)
For a way to morph C syntax that does make things better, see libmill[3].
People coming to C from Pascal would do this initially, and I still see attempts at using the C preprocessor to look like the author's other favorite language.
Consider that this was a time before the internet. People had much less communication with other programmers in those days. Lots of people learned C by picking up a book, not reading about it online. There really wasn't much of a way to learn best practices from others.
> What’s the clue? It’s got a little tire for a knob! Pull the lever up, and the gear gets sucked up. Push it down, the gear goes down. Is that self-evident or what?
Well, I didn't figure it out. Definitely easy to identify if you've heard this before, but that's not something that's self-evident.
> enum { Yes, No } is just an automatic no hire decision
This comes from process management where 0 is success and anything else is an error code. Shell conditionals are this way around because of that. Depending on the context it could be an indicator of someone who really understands the system but isn't yet familiar with more general coding idioms.
> The More Control Paths, the Less Understandable
Woah no, I feel like I agree with the title but that is a terrible example. It looks like Z happens even in Not Z situations, which requires digging into everything to find out what's actually going on. The code is lying to you, which is a really bad setup.
This could be mitigated by just renaming doZ() into something that hints it won't always occur, maybe tryDoZ() or checkDoZ() or something.
Consider another example. There was a case where the pilots had set the automatic air pressure regulator to "manual". As they climbed past 10,000 feet, a warning horn sounded because the cabin air pressure was too low. The horn sounded similar to another horn, and the pilots worked the other horn problem.
Then, they lost consciousness from lack of oxygen and crashed, killing all aboard.
This is not a self-evident user interface. A better one would be to change the beep of the warning horn to a voice: "alert: air pressure too low". Now the pilots save precious seconds figuring out what's wrong and how to fix it. The horning could be even better: "alert: air pressure too low. Don oxygen mask".
With the landing gear lever, maybe a pilot in training needs to be told what it does once. But he'd never have to be told again.
However, its embrace in the 1990's of garbage collection has proved its Achilles heel. Before anyone says you can write D without the garbage collection, yes you can but then you lose access to most of the standard library, and likely most third party libraries. Rust and C++ allow you not to have to use garbage collection but still use the standard library and 3rd party libraries.
With garbage collection, D is not directly competing not with C++ but with Java, Swift, Go, C# which have far more support across the industry.
It's worth noting many (A majority as I understand) AAA game studios end up writing their own STL. I'm willing to bet many industries where performance is the primary concern also write much of their core libraries from scratch.
D lets user opt-out of collector as needed which is quite nice. D has been used in AAA game development for what it's worth as well.
Refcounting is a form of garbage collection. While C++ and Rust also have refcounting, it is the default memory management mode in Swift in a way that it is not in C++ and Rust (where the defaults are more using stack based or unique_ptr/Box before reaching for refcounting).
No, it's not. Refcounting CAN be a garbage collection algorithm, but in Swift it's deterministic and done at compile time. Not to mention recently added support for non-copyable types that enforces unique ownership: https://github.com/apple/swift-evolution/blob/main/proposals...
I'm a D expat using and managing Rust now. Do I miss D, yes. The D syntax is so beautiful and easy to follow. D code was always plastic, except there's just too many attrs @safe, @system, @oh-god-what-next that don't play well with each other.
However, the runtime - not so much. I didn't complain much. We rewrote all of our core utils written in D (and many in C++) to Rust. No more null. Yes it took a while to onboard everyone to Rust, but hey, there's been zero production downtime since we migrated. Crazy compile times with Rust (compared to D), but we are happy with production runtimes!
Rust is what D should've been, but it's crazy verbose! I can live with that and I do love Rust.
AKA Command-query Separation (CQS)
https://en.wikipedia.org/wiki/Command%E2%80%93query_separati...
I use this often, but there is one typical scenario where I break this rule - creation commands. Often someone calling a command that creates an entity needs the ID of whatever was created. Yes, you can use UUIDs and pass this to the creation function (or use more complex methods), but I haven't really come across scenarios where this adds anything meaningful, not to mention that it's not always possible. So I generally break CQS for creation commands, i.e. they change state and return a value.
EDIT: I just noticed the Wikipedia article cites other examples, e.g. The `pop()` method on a Stack.