I can tell an amateur programmer from a professional by looking at their order of priorities when they grow a code base.
Amateur programmers tend to put code de-duplication at the top of their priority list and will burn the whole house down to that often-trivial end.
This writer is pointing out that there are other concerns that far, far trump duplicated code -- and she's right. However she's not elaborating enough on what it is a "wrong abstraction." We can be more precise.
The real offense when we factor duplicated code is the new dependency that is added to the system. And this is what amateurs don't understand. Every dependency you bring into your code architecture costs you and should be judiciously introduced. De-duplication of code ALONE is rarely a strong enough reason to add a dependency.
If you want to be a professional programmer one of the most important things to acquire is a distaste for dependencies. Every dependency you add should be carefully considered (and lamented as a reluctant necessity if you decide to introduce it). As a 20 year veteran in this industry having worked on myriad code bases, I will always prefer a code base with duplicated code and fewer dependencies than the other way around.
So back to the "wrong abstraction". When we compose systems, we are looking for fewest dependencies and stable dependencies. What I think the writer means by "the wrong abstraction" is a "volatile dependency".
I'm trying to be precise here because a common reaction to terms like "the wrong abstraction" is that wrong/right, its all subjective. The truth of the matter is that it's not subjective at all -- the higher-quality system is the one with optimally few dependencies and stable dependencies, these are measurable qualities.
Dependencies (coupling) is an important concern to address, but it's only 1 of 4 criteria that I consider and it's not the most important one. I try to optimize my code around reducing state, coupling, complexity and code, in that order. I'm willing to add increased coupling if it makes my code more stateless. I'm willing to make it more complex if it reduces coupling. And I'm willing to duplicate code if it makes the code less complex. Only if it doesn't increase state, coupling or complexity do I dedup code.
The reason I put stateless code as the highest priority is it's the easiest to reason about. Stateless logic functions the same whether run normally, in parallel or distributed. It's the easiest to test, since it requires very little setup code. And it's the easiest to scale up, since you just run another copy of it. Once you introduce state, your life gets significantly harder.
I think the reason that novice programmers optimize around code reduction is that it's the easiest of the 4 to spot. The other 3 are much more subtle and subjective and so will require greater experience to spot. But learning those priorities, in that order, has made me a significantly better developer.
> I'm willing to add increased coupling if it makes my code more stateless.
I like statelessness as a top priority. However I'm not sure how statelessness ever comes into tension w/ coupling. Aren't they mostly orthogonal concerns?
> I'm willing to make it more complex if it reduces coupling.
Complexity = f(Coupling), in my definition. So an increase in coupling results in an increase of complexity. Sounds like you have a different definition of complexity -- I'd love to hear it.
There's a few ways in which state vs coupling can play out. Often they're part of the architecture of a system rather than the low-level functions and types that a developer creates. As an example, should you keep an in-memory queue (state) of jobs coming into your system or maintain a separate queue component (coupling). By extracting the state from your component and isolating it in Rabbit or some other dedicated state management piece, you've made the job of managing that state easier and more explicit.
As for complexity, there are many different types. Coupling is a form of complexity, but it's not the only one. Cyclomatic complexity is another and one. Using regular expressions often increases the complexity of code. And one need only look at the spec for any reasonably popular hash function to see a completely different sort of complexity that's not the result of either coupling unique paths through code. The composite of all the different forms of complexity is how I'd define it since they all add to a developers cognitive load.
I think those four criteria beautifully capture many goals of software design. And for example, we could say that we reduce system state with a functional approach, make component coupling looser by indirection with object orientation, and decrease code complexity by structured programming.
Amusing arguments for statelessness from the ØMQ Guide by Pieter Hintjens [1]:
> "If there's one lesson we've learned from 30+ years of concurrent programming, it is: just don't share state. It's like two drunkards trying to share a beer. It doesn't matter if they're good buddies. Sooner or later, they're going to get into a fight. And the more drunkards you add to the table, the more they fight each other over the beer."
...
> "Code that wants to scale without limit does it like the Internet does, by sending messages and sharing nothing except a common contempt for broken programming metaphors."
Off the top of my head, there are two reasons people seem to de-duplicate code. One is because two or more things happen to share similar code. Another reason is because two or more things must share similar code.
It seems like you are speaking of the first reason. There is no dependency, and the programmer is creating one. IMHO you should have at least 3 instances before creating an abstraction to reduce your code.
The second reason is different though. By creating the abstraction you are not adding a dependency, you are making an implicit dependency explicit. There is a huge difference. In this instance any duplicate code is bad.
I have a third reason. Abstraction, or code de-duplication, if done right, ends up making code easier to reason about. The paper that made me rethink everything I knew is
I think that principle I've heard before called "1, 2, 3, abstract"; as in, wait until you see it at least three times before considering extraction.
I'd also add - wait until the code is 'stable', i.e no longer under active architectural development, connected only to other stable parts (or with stable/authoritative interfaces) and having then existed in such a state for a continued period of varied usage. then refactor.
I'd say this is sound advice for core components of the system, but we may want also to consider the types of the dependencies. For example, I would not wait for three times if the dependency is some kind of external system (DB, UI, Network, MQ, OS, Library, Framework, or the like) which is volatile in one of the worst ways - not directly under your control, unlike your own source code.
The distaste is for complexity. Adding new functions to reduce duplication can add complexity. It can also reduce complexity. It can also decrease complexity for a while and then in the long run increase complexity. Recall the famous quote, "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." Sometimes dependencies are really cheap - when they are the right abstraction. I think the article put it fine.
Rails itself is an example of an abstraction that reduces complexity for a while and then adds complexity when you reach a certain size. So it was the right abstraction at first, and then the requirements change and it slips over to the wrong abstraction. Here is an insightful comment about why that is both inevitable and doesn't matter: https://news.ycombinator.com/item?id=11028885
Here's a very objective and powerful way to measure complexity: dependencies and volatility.
Otherwise we're all saying "complex" but not being clear and likely meaning different things.
For example, a lot of people believe that "not easy" = "complex" but as Rich Hickey articulates that's a counterproductive way to think of complexity. (See http://www.infoq.com/presentations/Simple-Made-Easy)
If your system's design results in your stable components depend on the non-stable (volatile) components, your system is complex. This is because volatile components change often and these changes ripple to your stable components effectively rendering them volatile. Now the whole system becomes volatile, and the changes to it become very hard to reason about - hence complex.
Avoiding this problem has been captured, among others, by the Stable Dependencies Principle (http://c2.com/cgi/wiki?StableDependenciesPrinciple), which states that the dependencies should be in the direction of the stability. A related one is the Stable Abstractions Principle (http://c2.com/cgi/wiki?StableAbstractionsPrinciple), which states that components should be as abstract as they are stable.
I can tell an amateur programmer because they aren't getting paid. Flippancy aside though:
While I don't think it's what you're arguing for, I'd fear some would use an argument like this to defend a workflow and culture where new features are started by copy and pasting a mass of code, and then going in and tweaking little things here and there. Then when something has to change across the system, there are endless traps because things you need to change and maintain, that look identical, aren't quite. These workflows and cultures exist.
There's a balance somewhere and finding that is the hard part, right?
I understand and sympathize with the idea you suggest here, but I also wonder about the fine tuning. We accept that both duplication and dependency are bad (and certainly that the minimal, maximally stable dependency set is best) but when making that tradeoff what parameters make a duplications better or worse than a dependency? Are there languages or toolchains which cause this tradeoff to fall in the opposite direction?
In some sense this is academic, but in a very real sense disdain for dependency is something I worry can prevent a project from going through an important high-energy transitory period where semi-formed dependencies exist to solve concrete tasks but have not yet annealed into a final, low-energy form.
In small scopes a professional knows how to skip by this risk and get straight to the better abstraction. In large scopes whole projects must (often) pass through high-energy areas. Professionalism thus demands that these high-energy zones be identified and reduced.
But being too eager to avoid dependency might inhibit growth.
Whether all of this is academic or not, I don't know. But what I do know is that these ideas and their practical implications have an ENORMOUS impact on the practitioner's and business's productivity.
> in a very real sense disdain for dependency is something I worry can prevent a project from going through an important high-energy transitory period where semi-formed dependencies exist to solve concrete tasks but have not yet annealed into a final, low-energy form.
We must always riff and hack and creatively explore our domains in code -- this is another practice of the software professional, and notions of "architectural soundness" and "dependency reduction" should never paralyze us from creative play, sculpting and exploration. In these modes of development its best we "turn off" all the rules and let ourselves fly.
But for a code base that has to survive longer than 6 months and that will have more than one collaborator -- this is where it becomes essential to maintain architectural soundness in the shared branch. (My development branches, on the other hand, are in wild violation of all sorts of "rules" -- so there is a difference between what gets promoted to production code and all the exploratory stuff we should also be doing.)
I think the idea of exploratory work is very similar to what I mention going on in small scopes, but I think this evolution occurs at large scales, too. All the development branch isolation in the world can't and shouldn't stop this sort of broad scale evolution.
I think the right approach is not to avoid dependencies, but to manage them.
Let's say you are unsure of the correct UI framework just. React, Knockout or Angular? React native? Maybe you don't yet know which database best suits your usage and scaling needs. Should you avoid committing to those dependencies? For how long? Doesn't this slow you down?
A good way to approach this is to isolate the dependencies so that you don't have to commit to the actual implementations (React, Mongo, PostgreSQL, Angular, ZeroMQ, whatever it is you need) early. Of course you start the work with some set of frameworks and libraries, but so that no other part (the parts that does all the important stuff that it unique to your application) of the system knows of the implementations. This way, if the need arises, changing the implementation details will not be expensive.
Isolating the implementation behind an abstraction sometimes introduces boilerplate and duplication, but as the article mentions, the dependency is usually more costly.
I think writing an abstraction that allows you to decouple your app in a way that you could use either Angular or React is going to take far longer than just rewriting your app if you ever get to the point of needing to switch.
In case of a UI, the trick is not so much to write an abstraction for this, but to write the application logic such a way that it does not depend on the UI framework.
Your own UI code should be limited to displaying the data with the help of the framework. This minimal UI code should depend on the application code and that should be the only dependency between the two.
There are simple techniques to keep view logic and application logic decoupled, for example by introducing a separate data structures prepared and optimized for view components to consume. An added bonus of this is testability of both application and UI logic without opening a browser.
>when making that tradeoff what parameters make a duplications better or worse than a dependency?
This depends entirely upon the context. Overduplication and tight coupling are universally bad but the trade off between them is often a matter of opinion.
In practice I've found that once you've hit the point where you're trying to trade duplication + tight coupling off against one another the code is already in a decent enough state that there's other more important work to be done.
> I can tell an amateur programmer from a professional by looking at their order of priorities when they grow a code base.
> Amateur programmers tend to put code de-duplication at the top of their priority list and will burn the whole house down to that often-trivial end.
To be fair, lots of how to program books spend a lot of time on teaching you how to abstract, and sing abstractions praises to the heavens, as it were. It takes some experience to learn that real world stuff is not like the toy programs in books.
Novices are inexperienced, journeyman and masters are more skilled and experienced.
A novice can have a ton of knowledge (from books), but be too inexperienced to apply it.
A novice can be a professional, this is what internships and entry-level jobs are supposed to be for. Paired with mentorship and structured work assignments (structured in the sense of increasing complexity, scope, and responsibility) they're brought up to journeyman and, later, master level.
They can also be amateurs. Given forums, books, manuals, mentors (real-life or online), they can be brought up to journeyman and master level as well.
Besides I think everybody who e.g. makes some side-projects, or is active in open source is in fact amateur in the sense of "a person who does something (such as a sport or hobby) for pleasure and not as a job" :)
In general, if a professional also donated their time to something it doesn't make them an amateur. See lawyers doing pro bono work, or carpenters building a habitat house as examples.
Professional means you teach (notice the word root in "profess" as in "professor"). It really means you know enough that you can teach others how to do it right, not about get paid for it per se.
You have the etymology of "professor" and "professional" completely wrong. You can't just notice the same root in two words and then completely reinvent the meaning of one to make it have something to do with the meaning of the other. The evolution of language is complex. Here: http://lmgtfy.com/?q=etymology+profession
Do professional football players teach playing football? Some, probably, but not all. We don't call those who don't teach amateurs. They're getting paid. The amateurs are the high school and (arguably) college players, along with rec club and pick-up game players.
You're noticing a common root, but not the meaning of the word in the modern day.
Decimate means to destroy 1 in 10 of something (like an opposing army). But today we use the word to mean destruction of a large percentage.
I suppose an argument can be made that modern use of amateur is more akin to what used to be novice. However, I'd have a hard time accepting that except when it's used as a slur. We talk about amateurs in many fields, but don't intend to dismiss them as unskilled or inexperienced, we're classifying them as non-professionals. In a forum like this, filled with amateur programmers, it seems, to me, that it's wrong to misuse the term in this manner when a large portion of the readers here are amateur programmers but of moderate to high skill level.
Of course. You can't learn that sort of judgement (when to use vs when not to use) from a book.
But our profession's training material, at least in my experience reading, drills it in your head to use all these abstracting devices.
It's certainly true that you can get some "book knowledge" that tells you that you can over abstract. I mean, this blog post is one example. But I only hear this sort of stuff from things like blog post from experienced devs, it seems to me. (Or maybe I just read the wrong kind of books?)
Sounds like the books actually teach poor practise, divorced from context. The harms of abstraction is nothing that can't be printed - this isn't qualia
I broadly agree with this, but draw a slightly different conclusion: that the imperative when growing a large code base is proper tooling/planning for dependency management. If good dependency management is cheap, programmers will use it, and even if some dependencies are volatile, side-effects can be bounded by the dependency graph.
We have ~good (and certainly cheap) dependency management in Ruby (gems) and JavaScript (npm), and it leads projects straight to dependency fractal. No, it can't be bounded by making it easier.
> The real offense when we factor duplicated code is the new dependency that is added to the system.
A slight tangent on that note, but I think many of the problems with current web development result from the same root cause: adding yet another dependency to solve an almost trivial problem. Sometimes going to the extreme of for the sake of saving few keystrokes. Need one function to find an item in a collection? Reference Lodash. And then drop references all over the place.
Yes some dependencies are useful, but they all need to be handled with care. Wrap that search operation to an internal utility function and inject Lodash as its implementation if you don't want to reinvent the wheel. This is what DI is for (and nothing more).
Maybe the feedback of exploding the compile times as a result of a complex dependency graph would make people more sensitive to the issue. But then again, that did not prevent it from happening either. Oh well, it's not my codebase (yet).
I think javascript is intrinsically going to be one of the worst examples of that. Stemming back to the days when javascript programmers learned that using a library like (then jQuery) lodash is the _only_ way to correctly get your code to work in all cases. That culture then became ingrained, and in my opinion helped lead to the state most JS packages are in these days.
In my experience, this is one of the greatest things about the new prevalence of tools like babel; it pushes that abstraction layer down below in-code dependancies. There's still a dependency to manage, but it's not a library import or similar.
Babel doesn't solve the problem of having a minimal standard library. Unlike say Python or Ruby, where batteries are included, JS is roll it yourself, add a dependency, or go without in almost every case. Even for seemingly trivial things like "Array#contains"
A lot of things in there would need a library for cross-browser compatibility still. That's my point, but I take yours as well. It's still a very BYOL (bring your own library) kind of a language!
Additionally, what I also see that tends to separate out experienced vs inexperienced programmers is attempting to deduplicate things because they appear similar. By this, I mean that these two things just happen to do similar things or have similar fields, but the operation or structure that they are attempting to represent is fundamentally different.
When they eventually diverge, because they were different operations or structures, so they won't evolve the same... This is where you end up with functions that take 10 different option parameters, all of which activate a different code path. Or structures with completely different fields set depending on where it came from. And guess what? Now you're back to basically one code path or structure per use case, and an extra dependency that didn't exist before and is a nightmare to maintain without breaking one of those existing use cases.
There are also different kinds of dependencies. Some dependencies are so common, they might as well be a feature of the language itself (Google's OAuth Java client libraries or Apache HTTPClient, for example). It would be a waste of time to write a custom OAuth or HTTP library - much better to just add the dependency and go on your way.
IMO the danger is greater with other languages that don't have the library support that Java does. That Ruby Gem you use today may get abandoned in 6 months and become a piece of technical debt you have to deal with later down the line. It's always a balancing act.
There's also nothing inherently wrong with duplicated code. It can make a code base harder to maintain, but so can over-abstraction. But I agree with you and the author; most projects are better off when developers err on the side of verbosity.
Those dependencies you mentioned are all highly stable. That's the objective & precise way to say that they are "the right abstractions" [1].
[1] This should further articulate why "right/wrong abstraction" is not the useful nomenclature. I bet you I could find several programmers who could write a "better"/"more good" OAuth library, but that's not what is of priority here. What is most important is this objective quality: is the dependency (it's interface) stable? If it is, then my code architecture remains sturdy.
Of course there are different kinds of dependencies. I believe it was not the
point of wellpast that dependencies are evil and bad, I think it was that
dependencies have cost, and most the time programmers consider them as
(almost) free.
Indeed, dependencies have nearly zero immediate cost, but in the long run they
are expensive and need to be weighed against the gain from their use.
If this is mostly something amateurs do, then who are those people that bring us all these wonderful abstractions like AbstractClickableLittleButtonWithTinyImageOnTheLeftSideFactoryControllerStrategyImplementation?
I think the first priority should always be simplicity (KISS). Simplicity is understood by perfectionist developers (Simplicity is the ultimate sophistication) and also by beginners (if code become complex, there will be bugs). De-duplication (DRY) is important but should always come after simplicity.
But all we have to do is teach your half-baked "fewest, most stable dependencies" theory to the amateurs and then they'll be professionals, performing like 20 year veterans. Eye roll.
Amateur programmers tend to put code de-duplication at the top of their priority list and will burn the whole house down to that often-trivial end.
This writer is pointing out that there are other concerns that far, far trump duplicated code -- and she's right. However she's not elaborating enough on what it is a "wrong abstraction." We can be more precise.
The real offense when we factor duplicated code is the new dependency that is added to the system. And this is what amateurs don't understand. Every dependency you bring into your code architecture costs you and should be judiciously introduced. De-duplication of code ALONE is rarely a strong enough reason to add a dependency.
If you want to be a professional programmer one of the most important things to acquire is a distaste for dependencies. Every dependency you add should be carefully considered (and lamented as a reluctant necessity if you decide to introduce it). As a 20 year veteran in this industry having worked on myriad code bases, I will always prefer a code base with duplicated code and fewer dependencies than the other way around.
So back to the "wrong abstraction". When we compose systems, we are looking for fewest dependencies and stable dependencies. What I think the writer means by "the wrong abstraction" is a "volatile dependency".
I'm trying to be precise here because a common reaction to terms like "the wrong abstraction" is that wrong/right, its all subjective. The truth of the matter is that it's not subjective at all -- the higher-quality system is the one with optimally few dependencies and stable dependencies, these are measurable qualities.