Another pattern I've found helpful is to have branches "exit early"; if you have code like:
if !cache.contains(key) {
v = db.query("..")
expiry = calcexpiry(v)
cache.put(k, v, expiry)
return v;
} else {
return cache.get(k)
}
by the time you get to the "else", your mental stack is filled up with the cache loading logic - you have to work to recall what this else branch is triggered by.
instead, make the branch be the least complex option and exit early, and skip the else. That way, it can be read as "if this condition, do this tiny logic and exit", and that branch can then be mentally pruned.
if cache.contains(key) {
return cache.get(key)
}
v = db.query("..")
expiry = calcexpiry(v)
cache.put(k, v, expiry)
return v;
I have heard this called early-returns. I like to write code this way too. Proponents of one-return-per-function probably write code in languages like C which does not have GC or RAII so having only one exit point makes it easy to ensure all resources allocated by the function have been released.
There are ways to do early returns in C without duplicating resource releasing code though, using gotos (it's one of the few cases where gotos make sense, IMHO). Code looks like this:
int aFunctionWhichMightFail() {
int result = FAIL;
Foo* foo = allocateSomething();
if (!someFunction(foo)) {
goto fail;
}
moreWork(foo);
result = SUCCESS;
fail:
releaseFoo(foo);
return result;
}
> There are ways to do early returns in C without duplicating resource releasing code though, using gotos (it's one of the few cases where I think gotos make sense).
Gotos like that are fine. Despite the title of the paper I don't think Djikstra's meaning was that gotos should be removed entirely.
Yeah. Dijkstra was talking about not building your whole program out of if and goto, and using functions and for/while/etc. Particularly he's talking about Dartmouth BASIC that didn't have anything else.
Like, imagine the spaghetti lovecraftian horror of a non trivial program that's just if and goto. At the time, there was a particularly vocal camp that essentially made the argument that it shouldn't matter since if/goto is equivalent to what exists in the machine code anyway, and you can technically do all the same things.
Keep in mind adopting any strategy beyond if/goto as is in machine code comes with reduced efficiency. So we trade efficiency for readability. Some talented genius might make a super efficient piece of art by restricting to if/goto only, that no other approach could beat performance-wise. I guess anyone who does low-level machine code execution optimization is in that camp most of the time.
Not really. You want structured constructs, even in asm, because they play nicer with the speculation and prefetching hardware. Going nuts and throwing away all of the conventions of higher level languages typically kills your perf.
Well, it depends; if you can employ locality of code, cache lines etc. to your advantage it might pay off doing crazy things to optimize the innermost loop and similar. Though as you said, you'd structure whole blocks of functionality more higher-levelish. On some embedded systems it still pays off to modify code during execution, e.g. replacing a constant in innermost instruction by pre-computed value from higher level to avoid memory access penalty.
It doesn't make sense to me. the absence of "advanced" control flow constructs is one of the things that define assembly language - only branches and conditional branches, plus sometimes conditional execution of single instructions. Then I don't see how you can claim a non-existing thing can play nicer with speculation and prefetching.
There's a lot of optimization at the microarch level towards the kind of code a compiler emits. Modern chips are designed to run C very well, not arbitrary, technically allowed assembly. You'll hit a lot of perf bottlenecks if you throw weird code/data flow graphs at them.
Well, I'd like to see that. Instruction sets designed with high level languages in mind, sure; but speculation and caches disturbed by code that is not shaped like what a compiler does (which compiler, anyway?), that's doubtful to me.
Djikstra's meaning was that gotos should be removed entirely. By replacing them with the high-level control-flow constructs them were being used to implement. If your language doesn't have a high-level equivalent of `goto fail`... well Djikstra would probably say you shouldn't use that language in the first place, but given that you are, it's no worse than using `jlt .forloop` in assembly.
You are going to wind up with some kind of 'goto' regardless, whether it's having to navigate through independent files or various other constructs one takes for granted as being immutable. The point being, when you notice them, automate them out of the reasoning space so you don't drown in the complexity of one mundane detail being used to represent the overall state of the program (all the stuff you have to shove in your head to understand what your program is doing).
These things are hard to notice, and even harder to find elegant and effective solutions!
The point is to reduce the problem space so all you are left to look at, when you do come back to your code, is what's important to reason about without removing important details or adding trivialities. These problems haven't gone away just because the literal word goto isn't used anymore.
> If your language doesn't have a high-level equivalent of `goto fail`
Such as?
I am aware of writing this in different ways, of architecting the logic quite differently. But the only control-flow construct I've seen as alternatives are to abuse exceptions or labelled while with named breaks.
What were you thinking of when you wrote this? I'm genuinely interested.
Deferred statements (in go) are really useful for delaying the runtime of blocks of code to the end of a context, but leaving them in the logical context where you're setting up a resource.
I'm actually not sure offhand how multiple deferred statements are handled. I could envision that the specification might make no explicit guarantee about the runtime order or it might create and pop off of a stack (at least in behavior).
Common Lisp has `with-open-file`, which guarantees files get closed regardless of what happens in the body, even if an exception is thrown (the macro closes the file and then propagates the exception in that case):
> If a new output file is being written, and control leaves abnormally, the file is aborted and the file system is left, so far as possible, as if the file had never been opened.
> Python doesn't give you the macros needed to write your own for other kinds of resource.
Python doesn't use macros, true. But it absolutely gives you the ability to do 'with' for other kinds of resource (and anything else). Look for the with-statement and documentation on defining your own context manager.
with-open-file is just a convenience macro around the underlying functionality which is provided by unwind-protect which is pretty much identical to try/finally in Java.
The ease of writing macros such as with-open-file means that most libraries that need to have similar types of features for their resources that needs releasing also provides similar functionality.
When the code exits the try block, you are guaranteed that it will run the finally block, possibly entering the catch block first if an exception was thrown. Since the most common use of `goto fail` pattern is freeing memory, the `finally` block isn't actually used a lot in Java/C# code in practice.
Thank you. Since the example was nothing to do with exceptions, and we don't want to abuse exceptions by using them when they are not needed, I will rewrite your example to cover the GP case:
I could be misunderstanding here, but I'm fairly sure that in the spirit of the paper `goto fail` would be preferable to throwing an exception and unwinding the stack at any arbitrary function call.
Dijkstra's original title was a "a case against the goto statement". The clickbait title "goto considered harmful" was chosen by the editor of the CACM
I've extensively written code like this, though I don't seem to see others do it very often.
I suppose if you really hate gotos, then a "do { } while(0);" with a "break;" in the middle of it is equivalent.
There's just too much cleanup, such that avoiding gotos means you're either nested 200 columns into the screen or have half your function filled with partially-duplicated cleanup code.
Somewhere, in the long dark ago, a bunch of people got upset about long functions with return statements peppered throughout.
So we ended up with a Best Practice of one return clause per function. Which is triply stupid because 1) long functions are part of the problem, 2) break to the end with multiple assignments to the same variable is almost as bad. It’s just trading one stupid for another which is a waste of time and energy.
And then for many years a loud minority of us, some of whom actually have a vague understanding of how the human brain processes data, insisted on the pattern you describe here. Some of us call it “book ending”. All the returns are fast or proceed to the end. Any method that can’t accomplish either has too high a complexity anyway and gets split until you get either one return, or book ends.
[another responder calls them guard clauses, but this doesn’t quite apply in all situations. A clause that handles a default or a bad input is why you can rearrange, but it misse why you should: ergonomics]
The single return best practice comes from the old days of C. Back then, compilers didn't inline function calls as aggressively as they do now, and the overhead of a call was a relatively significant fraction of execution time. That led to longer functions being a good idea.
At the same time, C does not have any features for reliably managing resources. No GC. No C++'s RAII. Instead, you have to be careful to release every resource you acquire in the body of a function, along every execution path.
In that world, avoiding multiple returns makes a lot of sense because it ensures every execution path reliably reaches the end of the function so it's easier to ensure resources are released.
We aren't in that world anymore, so the guideline no longer applies. But it wasn't a stupid rule. It's just unhelpful to take it out of its original context and expect it to apply to a new one.
"Check your boots for scorpions before putting them on" isn't a dumb rule in the desert, but it is in Antarctica.
Ah yes, that's true about C. I think my aggressive distaste comes from seeing people cargo cult this into languages like Java, Python, and Ruby.
Makes no sense to turn your code inside out for resource management when you picked a language where you don't have to turn your code inside out for resource management.
Even in C these kind of early returns can be good. In the example above you would be returning before allocating any memory. This makes the memory you do have to manage simpler.
> Somewhere, in the long dark ago, a bunch of people got upset about long functions with return statements peppered throughout.
I got taught this matter of style in the context of 90s C/C++. The general idea was that many return points likely means many point of having clean up any state you've been trying to manage/encapsulate with the function... in particular the state of manually managed memory. Which probably means mistakes.
It's a reasonable argument for manually managed memory, loses much of its power once you're not. And these days, most of us are not.
I believe this is commonly referred to as a guard clause? I tend to prefer writing things this way too... but I could see an argument for putting something in an else block instead of just leaving it there.
Personally, I find the first version easier to read since it makes the structure explicit. I.e. there are two blocks and you will execute one or the other.
To me the second version makes it more difficult to differentiate the structure:
if (...) {
foo()
return ...
}
return bar()
from:
if (...) {
foo()
}
return bar()
I find that when I'm reading code I'm generally more interested in the overall structure than in the specific implementation details. Especially since if the first block was complicated enough to tax my cognitive overhead, then I would probably extract it into a function anyways.
There's a reason the pattern is referred to as "exiting early". Another description might be using guards. Basically, you shouldn't do this if you're doing something complex Basically you can imagine each of the if statements being a "guard" against an condition that might cause you to exit early. This becomes much clearer if you have multiple such clauses, and add the mental note that each conditional should contain only a return statement with a single expression.
function foo(bar) {
if is_null(bar) {
return TypeError
}
if !validate_input(bar) {
return ValueError
}
if cached(bar) {
return cache(bar)
}
// Added as a workaround for client XYZCorp since...
if (bar == "A special value) {
return "A special result"
}
return do_a_complex_thing_to(bar)
}
You could equivalently structure it as a bunch of else ifs, but there's no real reason to do it that way.
I like and use early exits when they fit well, which is often. And in this case I'd probably do as you suggest. But it's worth mentioning that even if you stick with if/else it's often worth it to put the simple case first to reduce mental load:
if cache.contains(key) {
return cache.get(key)
} else {
v = db.query("..")
expiry = calcexpiry(v)
cache.put(k, v, expiry)
return v;
}
I think 'else' shouldn't be shunned where appropriate, I usually like a lack of else whenever checking function pre-conditions while as I'll include an else (even with both branches returning) if the two branches represent comparable operations. English can be horribly abused to fit anything, but usually saying it out loud can help "If the user id isn't numeric - we're done. If the user doesn't have access - we're done. If the user wants to view the resource in purple shift, let's format it like so and give it back to them, otherwise we'll use a yellow shift and hand it off."
If it's a bug, sure. However, it might simply be a "not found" - for example, you're trying to decide whether to insert or update a record. You look for an existing one and don't find it - you shouldn't throw here, because "record not found" is a valid outcome.
Unless it’s a one-off thing, it’s also worth considering adding a cache class. This can have the advantages of further decoupling the cache and computation logic, and being able to switch the cache implementation (e.g. to use a no-op cache for debugging). It also hides the “if” by introducing a “cache pattern”:
Thank for posting this! I recently had quite a few discussions about this topic with people who insist to _always_ use if/elseif/elsif/... constructs in such situations. I tried to argue that in most situations a "if (a) {return}; if (b) {return}; do_c()" construction is much easier to understand but I had only little success convincing them.
I always say that if you know the answer, why convolute the code? Just return the damn answer and we're done here.
I see these early returns as a list of assumptions. It helps making them nicely lined up, with perhaps some comments before each, and an empty line between them.
And then when you get below to the, now non-nested, code that actually does some work you have a smaller mental burden needed to understand what is actually done.
Personally, it is better to have functions small enough that they fit on a handful of lines, and the if-else structure makes it obvious that you are doing one or another thing, rather than having separate if structures at the top of the function that may-or-may-not exit before you reach the main body of the function.
I think it’s a judgement call: I like clearly indicating when there’s always a single output path but wouldn’t use the single return style in that cache example if it involved an extra lookup operation for a value which is already known.
Yes - the cache example is bad; my point is the general idea of avoiding branches that merge back into the main function.
In your example above, the branch does merge back, and clearly that's fine for a small function - and common pattern - like this. But the general idea would be to avoid it if possible, because that removes unnecessary complexity.
One small problem is that depending on how your threading and cache works it might be possible for the cache to be cleared after the contains call but before the get.
imho, a get() from cache when you already have the value seems like an unnecessary waste of cycles and network traffic.
plus this arrangement would seem vulnerable to a pathological case where you found the V from the db, but have a full/failed/partitioned cache and end up faulting or otherwise not returning the V.
instead, make the branch be the least complex option and exit early, and skip the else. That way, it can be read as "if this condition, do this tiny logic and exit", and that branch can then be mentally pruned.