Technical debt is best understood exactly as the term suggests, like a loan. Whe...

yourabstraction · on Nov 6, 2020

This is exactly how I look at technical debt, yet it seems so many people miss the appropriate financial metaphor, which seems obvious given debt is in the name. It seems popular to conflate technical debt with bad engineering, but appropriately pushing work off to the future is actually very thoughtful engineering. Technical debt is a tool that can be applied (just like financial debt) to work towards an end goal when you otherwise don't have the means to pay the price upfront.

Getting a startup off the ground requires an excellent understanding of how much technical debt you can incur and realistic estimates about your ability to pay it off as your team grows.

crdrost · on Nov 6, 2020

This blog post we're all commenting on is great precisely because it links Ward Cunningham's talk. His understanding is precisely this financial metaphor.

I have considered giving a talk, “When Technical Debt was a Good Thing”. Like you say, it gets mis-equated with bad engineering but originally the point was that technical debt was a good thing, like, let's have some technical debt! This is gonna be great!

If you’re looking at 1980s attitudes on software development, there was a strong focus on getting the engineering right, we're gonna be like generals and we're gonna have a vague direction of success and then our lieutenants will reify that into a concrete plan of success and then the privates will go off into the trenches to, here the metaphor finally breaks, build the application: and then we'll evaluate their performance on fidelity to the design that the lieutenants gave and we'll evaluate the lieutenants on how well that design matches the goals the generals set out to do. Very hierarchical, design-first.

And the point of this metaphor was, “No, stop. Stop with all the BS. Get those privates building something right now. Doesn’t matter if it’s not the thing the lieutenant would have designed. Let’s intentionally build the wrong thing.”

And it’s like, shock and horror, why would you ever want to do that? Then comes the metaphor. “It’s like spending money you don’t have, people act like it’s impossible or perhaps like it’s immoral—it’s neither. Credit systems exist. Debt exists. We’re just lifting that notion to the technical sphere.” Why would you want to do it? Same reason you take on any other debt: you think that you can outrun the interest. I take out a car loan because I think that the car will help me earn enough to cover the loan repayment amounts and save me time and/or money.

In this case, because we don’t have the final design we’ll only conform 90% to what we’re supposed to be producing, say. That 10% loss is the interest. We pay it because we think we can cover it later.

Very much, Agile is an approach to creating technical debt. The opponent of both is the same, a "waterfall" style where you know exactly what needs to be built before you build it.

BlueTemplar · on Nov 6, 2020

Hmm, that's not what I was taught about Agile vs Waterfall.

It was about the number of design cycles –

(discussion with client / design / programming / bugfixing / client feedback)

– before the final version was out : only 1 for the most rigid waterfall, and usually weekly cycles for the most agile.

So it was not so much about any technical debt, but rather about the lack of understanding of what the client wanted (often the ignorance of the client himself of what he actually wanted).

Incidentally, the most radical agile cycle would involve throwing out all the code on each cycle, which would of course throw away all the technical debt in the process (and hopefully prevent from making the same technical debt mistakes twice).

Jtsummers · on Nov 6, 2020

Your understanding is correct. Waterfall (may it die a fiery death) is at an extreme that assumes it's possible to know everything upfront. What most people fail to understand (who like it) is that Waterfall fails to scale. Agile is (almost) at the other extreme.

Waterfall cannot scale beyond either well-understood domains (a web shop that uses RoR could probably apply Waterfall to a new project at this point, with over a decade of experience for each member of the team) or smaller systems (I'd say no bigger than around 100k SLOC of C code, beyond that your system usually becomes too complex) or short time lines (less than 3-6 months, which implies smaller systems). The assumption of complete and proper understanding will bite you on either a larger project or novel domain (novel to you or the world) or a longer timeline.

Agile (I'll take Scrum's 1-2 week cycle + extreme programming) aims to shorten the feedback loop so you know that not only have you built what you thought you built (verification) but also what your customer wanted you to build (validation) on a regular basis.

If you're wrong with Waterfall, you've wasted years. If you're wrong with Agile, you've wasted weeks.

And boy can you be wrong with Waterfall. I joined at the tail end of a 5-year Waterfall project, and they had fucked it up. Over 300 people had worked on it, and it was the wrong damn thing (had half the features needed by the time it shipped, and those half barely worked, and if they worked they were too slow to actually be useful). 1500 years, let's say an average of 75 years per lifetime. That was 20 human lifetimes wasted. And let's not talk about the billions of dollars that went into it. (300 is probably a low estimate, once you get into the subs of the subs of the subcontractors it's hard to get a good tally.)

jasonwatkinspdx · on Nov 6, 2020

Ward talks about this occasionally, but generally there's a frustration among the original agile manifesto authors about how the principles they were after became distorted by the consultant industry that spawned behind the terms.

The financial metaphor above is definitely what Ward meant when coining the phrase technical debt, and it dovetails into agile in the sense one should be making conscious decisions about balancing upfront investment vs uncertain goals. Technical debt can be an opportunity in the sense of doing the learning on the cheap, but you can't run up that debt indefinitely.

crdrost · on Nov 6, 2020

> So it was not so much about any technical debt, but rather about the lack of understanding of what the client wanted (often the ignorance of the client himself of what he actually wanted).

I cannot emphasize enough how much I want you to read, watch, listen to Ward Cunningham in his own words. :)

If you know what technical debt really is, then this sentence kind of sticks out as a sore thumb, as it is literally saying "So it was not so much about any technical debt, but rather about the technical debt."

Technical debt viewed this way is not just “hurdles in my code that I don’t like to jump over.” It is a subset of those hurdles and kludges and nicks and grumbles. What subset? Precisely the subset that corresponds to our lack of up-front understanding of what we were building when we were at first building it.

Put another way, every single problem we can solve well has at least one (maybe many) perspective from which that solution seems “relatively straightforward.” If you can specify that perspective in a way that a computer can understand it, you can think of that as “creating an algebra” for that problem and similar problems, or perhaps a “domain-specific language.” Technical debt is precisely the mismatch between the language you are programming in, and the domain you are programming for. Syntactically, this "language" might be some programming language like Java. But semantically, you get to define your own classes and methods and create your own little world inside that language, where things can be combined and chained together, and the richness and algebraic predictability of that little world informs the spaghettification and robustness of your code to implement the algorithm within that world.

Viewed this way, a lot of programming today spends absolutely no investment into that "little world" and just consists of writing FORTRAN. Just pages and pages and pages of FORTRAN. I mean, you're writing it in some other language, maybe Python, maybe Java: but you do not have any sort of little algebraic world that you phrase your problem in, you just say "fetch this input, fetch that input, begin subroutine with vaguely suggestive name to label the following steps, compare the two inputs, fetch other data as needed, perform these side-effects, modify these variables, return to the beginning of some loop." Very verbose individual commands being operated on individual data structures.

So I tell you "make sure that if I press this button a hundred times you don't start the job a hundred times, only one job per project name should be running at any given time, but if I press the button while a job is ongoing you should still probably trust that I made a mistake when I first pushed the button and now I want to abort if possible and retry with the latest version" and because you don't have a rich vocabulary for that you're stuck programming something in a "redis assembly language" or whatever,

    redis_job_flag = "job_id_for_" + project.name;
    current_job = redis.get(redis_job_flag);
    if (current_job) {
      redis_complete_flag = "job_is_complete_" + current_job;
      add_followup = redis.setnx(redis_complete_flag, "afterwards=run_again", 60*minutes);
      if (!add_followup) {
        followup = redis.get(redis_complete_flag);
        if (followup == "afterwards=run_again") {
          // someone else got there first
          return;
        } else {
          // race condition, it completed before we could
          // set the followup attempt, so we need to retry
          // this method to restart it
          return recurse();
        }
      } else {
        // ok we can trust that they will redo what they just did
        return;
      }
    }
    // if we're here then we did not see any prior jobs.
    my_id = generate_random_id()
    try_to_setnx = redis.setnx(redis_job_flag, my_id, 60*minutes);
    if (!try_to_setnx) {
      // race condition, someone else got started before me.
      // better still set a follow-up just in case.
      return recurse();
    }
    // hooray! I am the unique one running the job.
    redis_complete_flag = "job_is_complete_" + my_id;
    looping = true
    while (looping) {
      doTheActualThing();
      really_done = redis.setnx(redis_complete_flag, "complete", 60*minutes);
      if (really_done) {
        redis.unset(redis_job_flag);
        looping = false;
      } else {
        // let someone else tell us to rerun again
        redis.unset(redis_complete_flag);
        // bug we ran into where someone made us rerun jobs
        // for an hour, we reset the timeout for this here
        redis.set(redis_job_flag, my_id, 60*minutes);
      }
    }

and then this requirement is restated for other buttons and that gets copy/pasted everywhere and other mutations get inserted in various places.

And I think what Ward would tell you is not "never write all those lines of code" but that this eventually needs a refactor so that it matches the business language,

    repeatableDebounce("project_jobs", project.name, doTheActualThing);
    // repeatableDebounce defined elsewhere, 
    // it is part of our "little world"

Basically, Ward wants you to start doing "aspect-oriented programming" because it turns out that is a felicitous perspective from which to view these requirements about needing to cache whether jobs are in-transit and if they are then to retry them and whatever else. Technical debt is the noise in the above implementation being pasted and tweaked across the entire codebase because we didn't have this perspective when we were starting.

BlueTemplar · on Nov 7, 2020

Ok, I'll try to find some Ward Cunningham's work.

----

> Technical debt is precisely the mismatch between the language you are programming in, and the domain you are programming for.

That's a concept on a very different level from

> the lack of understanding of what the client wanted (often the ignorance of the client himself of what he actually wanted)

And the term "technical debt" seems to be a very bad name for either of those. Just look at the examples the others provided : the word 'technical' makes one think that it's an issue with tool (= codebase) maintenance first !

----

> semantically, you get to define your own classes and methods and create your own little world inside that language, where things can be combined and chained together

That is if you're even using OOP in the first place. I'm now viewing it with suspicion, especially after the only course we had on it, in Java, where we weren't even warned about things like "Composition over Inheritance". (Ok, we also had a half-course on UML.)

OOP seems to me more like a tool that should only applied to specific problems : for instance making a GUI (I have some Python-Qt experience).

In the same way, I do see programming languages themselves as more or less fitting to certain problems. Speaking of which, this semester I had to pick between C+ (C++ with a minimum of libraries, and god forbid, no OOP) and Fortran, and I picked C+ because Linux is mostly written in C. And hopefully one can find a language where you can keep the "impedance mismatch" between the language and the problem to a minimum. But for that you have to figure out what the problem actually is in the first place, which might take several design cycles !

But I'll try to read up on this "aspect-oriented programming".

brlewis · on Nov 6, 2020

> I disagree that you should refactoring code "so that the entire team can understand the code". Teams turnover much more frequently than a code base should

I didn't read it as saying the code should be refactored every time the team changes. I read it as saying that as the organization's understanding of what the software does evolves, the structure of the code should evolve with that understanding.