> You’re about 90% sure you know how to start the app. You’re 80% sure you know how to handle the infra you’ll need to get it online. And you’re 70% sure you know how to get your first customer. What is your chance of successfully going from zero to first customer? 0.9 * 0.8 * 0.7 = a little over 0.5.
This would be true if the probabilities were independent [1], which they are probably not, as the infra is tied to the app.
The probability can be anything from 40% to 70%. You can see that if you arrange 3 rectangles inside a unit square, one with 90% of the size, one with 80%, one with 70%. If they contain each other, their overlap is 70% of the whole, namely the smallest rectangle. In the worst case it is 40%, when the 80% is 10% out of 90%, giving 70% overlap. And then 70% is 30% out of that overlap, leaving 40%.
You're not wrong, mathematically. But these numbers (90%, 80%, 70%) are made up, so it's kind of irrelevant whether we're doing the maths right or not.
Quite the contrary. The only thing that really matters when it comes to probabilistic forecasting is that we do the maths right.
The numbers will always be made up estimations, so the only thing that sort of makes sure we converge on similar solutions is getting the maths right.
A person is free to make any outlandish estimation of any individual probability, but when they are faced with the task of trying to make it mathematically cohere with other estimations is when they realise how outlandish the initial estimation was.
> Say you’re working on a Laravel web app. You’re about 90% sure you know how to start the app. You’re 80% sure you know how to handle the infra you’ll need to get it online. And you’re 70% sure you know how to get your first customer.
Are these events well defined?
1. Starting the app (about 90%)
2. Handle the infra (80%)
3. Get your first customer (70%)
3. is possibly well defined, or could be made so. The others are not. So not only are the numbers made up, the events are not even well defined in the first place! Why would we go to the effort of ensuring correct maths here?
I'm going to give OP the benefit of the doubt and assume they have sat down and figured out what "starting the app" really means, but they didn't want that definition to be the focus of the article.
Yes, the confidence interval on the model which assigns the probabilities is so large, any confidence on derived conclusions here is pretty meaningless.
Correct, it would be better read as P(app start) * P(infra ok|app start) * P(first dollar|infra ok and app start). That's just a mouthful so I elided the "givens".
"Software estimation is such a mess in part because it has trouble recognizing that, at least, just-in-time learning is at least non-normally distributed. Everything we know about traditional project management, from Waterfall to Gantt charts to estimation practices, are on some level based around the idea that each individual step in the chain is bell-shaped: [..]"
From experience I 100% agree with this, but it always caused a certain cognitive dissonance when thinking about project management approaches.
A project is by definition a time bound one time endeavor. It is something where the crucial parts have never been done before. The unknown is the defining characteristic. Learning should be a defining characteristic as well, yet many approaches pretend it's a small part at best.
My working theory is that the majority of "projects" that are typically used as comparison with IT projects (like construction) are not real projects at all. Real projects are rare (e.g. Burj Khalifa).
In IT the ratio is the opposite. The real projects dominate and the "construction sites" are still the minority. But we can expect this to change.
Thinking this to the end would mean that we could see a resurgence of waterfall in IT - likely under a different name.
The project is the project side of the construction project and not construction side of the construction project =D
Typically, in construction projects, 90%[1] of the challenges are encountered and resolved in the design office by a multi-disciplinary, multi-organisation design team effort across multiple years...
[1] - Anecdotal evidence drawn from my experience as a structural engineer.
My point is that the "project" side is not really a "temporary endeavor undertaken to create a unique product" [1] in the majority of cases.
Sure the environment, circumstances and result differs for every construction project, but the variance doesn't justify to call it a project in most cases in my opinion.
We've built millions of modern bridges that are in operation and we've been doing it for more than 3000 years. The number of environments, configurations and requirements we've not already encountered is very limited and these are the real construction projects that - no doubt - do exist.
Now, contrast this with the number of CRUD apps we've built and for how long we've been doing it. There is still a lot to learn there.
That's the reason why more bridges are on time and budget than CRUD apps, in my opinion.
I sort of understand what you're saying but I'm struggling to agree with your examples in the comparison.
Even if you're working on a "project" comprising hundreds of near identical houses, there could be massive differences in the project constraints and their solutions for adjacent plots for any given selection of houses.
There could for example be a large tree with root protection zones on site where you would have to carefully design the foundations to account for this and their future effects such as heave due to volume change potential of the underlying soil.
My point is that there are many "hidden" problems solved by the design team during the project / design phase even for seemingly insignificant or simpler "projects". In my experience of analysing and designing hundreds of buildings for over a decade every project was unique and as such treated like a "project".
I think there is a spectrum and I think most construction projects are more similar to mass production than real projects. I'm not a civil engineer and maybe I am wrong and it is not the best example.
The point I am trying to make is that there are more unknowns in most IT projects than in many other endeavors because it is still a young field. If construction is the best counterexample, I'm not sure.
"Construction site" projects in IT would be Wordpress sites/small business eCommerce etc. and there are millions of successful projects of this type out there.
Building them might be messy, but rarely fails from a technical point of view.
"there are millions of successful projects of this type out there."
I wish and believe we are getting there but I don't see it today.
Just as an example: I traveled a lot after COVID and almost every tourist attraction has a photographer or photo site and they upsell by offering you pictures. I have yet to encounter two attractions that use the same system, they are all different and terrible in different ways. They all do basically the same thing, so there is no reason not to have a standard solution. I have no doubt one will replace the existing solutions sooner than later. So, millions of them but I would hardly call them successful.
Them being messy is the collective learning process that has been mostly completed for real construction projects, but is still in full swing for IT projects.
I think you two agree: there are millions of successful projects out there, but they are not standardized. But the guy who made 1000 of them knows exactly how long it will take him to build the 1001th
Whilst waterfall-like may be the approach, at least some (often the largest) steps are tough to estimate. The problem waterfall has is that it attracts deadlines like roadkill attracts flies.
Presenting as following "agile" gives a strong body of literature to fight this - and actually do something else. We've all seen the criticisms that nobody does agile right - and for many, thats deliberate.
IMO a waterfall like process can work for a lot of projects, but you need a lot of time for estimation/derisking, and this work is hard, so often developers don't want to do it, and management/customers often don't want to pay for it. Some of the most successful projects I've worked on have started with an estimate that had only a few items with more than 4 hours attached to them. If you get estimates down to that level, that means that most of the significant technical decisions have already been made, and you've determined that your technical approach should work. If you do that, and then add a calibrated amount of padding for mistakes, you can very often hit estimates pretty much dead on (+-10%).
How much time it takes to prepare the estimate depends a lot on the nature of the work though. If there's something pretty similar the team has done multiple times, and you have access to estimates/tickets/timesheets, you can often get it done in about 5-10% of the total project time. However, for example, if it's an ML project, there may be a lot of unknowns that have to be clarified first. For example, data quality is almost always not what it is claimed to be at the outset, so this needs to be resolved before the estimate is generated.
Deadlines are good. The world operates on deadlines and IT is the exception.
We can build bridges on time and budget most of the time because we've built millions of them in modern times and have been doing it for more than 3000 years.
In software we are not there but I have no doubt that we will, and hopefully it won't take us that long.
Also made this exact association. My takeaway: when estimating unknown quantities in software development you can treat means as infinite thus making any project unestimatable (management hates this reasoning - a friend told me, pinky promise); otherwise assign some non-zero probability than any mean will be 10-100x of the estimate.
I think the problem is not just software estimation, but project cost and time estimation is difficult across many domains (construction, transportation, IT, defense, and even the organization of events).
The word "mess" seems to indicate that the uncertainty is easily fixed, but if it has happened across domains for several millennia, it could indicate that there is something fundamentally challenging.
The situation worsens with factors such as project size, requirement changes, methodology (too much or too little of it), technology (specially new technologies), the nature of the delivering institution (public institutions performing worst than private ones), and organizational culture (e.g. waterfall being in many cases detrimental by making adaptations more difficult).
There are certainly bad ideas in the field of estimation, like assuming a Gaussian distribution, but the problem is far from trivial.
See for example:
* Defense: Bolten, Joseph G., et al. Sources of weapon system cost growth: Analysis of 35 major defense acquisition programs. Rand Corporation, 2008.
* Public works: Flyvbjerg, Bent, Mette Skamris Holm, and Soren Buhl. "Underestimating costs in public works projects: Error or lie?." Journal of the American planning association 68.3 (2002): 279-295.
* Transportation: Cantarelli, Chantal C., et al. "Cost overruns in large-scale transportation infrastructure projects: Explanations and their theoretical embeddedness." arXiv preprint arXiv:1307.2176 (2013).
* Olympic Games: Flyvbjerg, Bent, Alexander Budzier, and Daniel Lunn. "Regression to the tail: Why the Olympics blow up." Environment and Planning A: Economy and Space 53.2 (2021): 233-260.
I think I've said this before on this forum, but it bears repeating: if there's one thing that I'll never forget from my Industrial Engineering degree is that all processes can be designed around long waits, but it's the variance that kills you.
All Problems are not created the same. And therefore what learning and doing happens depends on the Kind of Problem. Exploring an unknown jungle for gold is a very different problem than Exploiting an existing goldmine. One problem is more biased towards learning, while the other is biased towards doing.
The managers and teams you end up producing for each type of problem is very different. The distribution of skills/knowledge/experience/creativity/curiosity etc at the end of solving an explore problem is very different from what you end up with solving an exploit problem.
Issues arise when you take a team thats good at Explore and make them work on Exploit. Or vice versa. In our current chaotic ever changing environment most problems have elements of both that keep changing.
This is fascinating, but I'm not sure this is the reason (or a good excuse for?) why software estimation is a mess right now. If you look at standard industry practice, the concept of "probability" rarely enters the discussion at all when estimating.
I've personally had great success with three-point estimation. [1] As far as statistical methods go, its pretty simple - yet it tends to work surprisingly well. To my knowledge however, none of the popular software on the market has any of the tools needed to do even something as basic as this, at least out of the box.
There is a whole set of tools out there for probabilistic modeling, and if we're serious about estimation, we should start talking about them.
From the trenches, the observable reason that approaches like this aren't taken up is often because of the power dynamics at play. Stakeholders want to treat delivery teams as feature factories, and typically have the organisational power to reject estimates that aren't in the form of "this will be done tomorrow/next week/in three months". They choose to reinforce the viewpoint that an inability to provide and keep to hard delivery deadlines is a marker of incompetence, rather than a realistic assessment of the basic physics of the situation. This is - to put it mildly - not helped by situations where hard deadlines are agreed to without speaking to delivery teams at all.
There is a lack of maturity in how software delivery is commissioned in the industry, but the fundamental issue is that the political drivers that result where Taylorist management interfaces with unpredictability in delivery reward behaving in a way that doesn't improve that maturity.
"this will be done tomorrow/next week/in three months"
This type of answer is also possible if you can agree on the desired confidence. I've found that 95% confidence is sufficient for most cases, but that number will depend on context. Understanding the stakes of the deadline for stakeholders can help adjust this further.
I agree that building trust with stakeholders can be tricky, and discussions about risks can be sometimes be interpreted as unwillingness to take responsibility. In that situation, the team can at least get a measure on their level of confidence and give an initial number they're somewhat comfortable with, as well as an estimate of how likely to succeed the agreed upon number is.
What is your technique for estimating in the case that the tech stack is (at least in part) unfamiliar? Can this technique be applied by junior developers with only a few years experience?
For cases where the tech stack is in some ways unfamiliar, we create research stories where the goal is to do the quickest thing possible to answer questions about the tech. This is typically timeboxed by the max time we'd be willing to spend on it, rather than the estimated maximum time. Its still useful to try and put a maximum estimate if possible, because then you can also estimate probability of success for the timebox which makes you better prepared for discussion with business to help them decide if they want to take the risk to invest time in it.
Few years of experience should be quite sufficient, although I think overall team culture is probably more important. I've usually done this type of estimation together with teams where there's already good communication, but when that's not the case something like pointing poker to open up safe discussion and uncover uncertainties may make sense as well.
What's wrong with having a normal distribution of only non-negative values? Or is somebody specifically talking about a _standard_ normal distribution?
A normal distribution is infinite so there is always a (small) nonzero expected value for any real number, so technically, even for "mean" much larger than zero.
However, GP is accidentally reversing the causality, which is incorrect:
We say that something follows a normal distribution if the samples we observe fit, e.g. shoe sizes or height, which are clearly non-negative.
It doesn't mean that any possible distribution value must occur in the original samples for the population to be normally distributed.
I think the OPs point is that it’s bounded below at a point not-distant from the mean, and hence you have a one-tailed and not two-tailed distribution. That non-trivially changes the distribution.
And then commented why: you have a finite amount you can do a task faster; but an infinite amount you can do it slower.
That is in contrast to heights or shoe sizes, which are (effectively) bounded on both sides while having the bounds distant from the mean.
That is just being pedantic. In reality, there are no observable tasks that take infinite time.
And the actual statistical process is probably unknowable anyway, we are very likely dealing with a sum of random variables, hence the Central Limit Theorem applies anyway.
Thanks for explaining! This stretching to infinity is interesting, because it assigns a probability to outcomes even if they cannot happen, for example where observed values cannot go negative like here. That shows the difference between the "model" and the reality we try to fit it on; and that the model is a tool that just needs to fit well enough to be useful.
You say 'convenience' and I don't know if it's a strong enough word - the gap in difficulty between working with normals and anything else is like in the draw the rest of the owl meme - it's so wide nobody ever bothers.
That doesn’t look so specific to software field though. We, mankind, are making buildings since even before we had writing systems. Are they perfect methodologies to make any building in expected time?
Sure you can maybe make a bit more of rationalization and replicate a house many time with affordable local material. But you will not replicate the same house everywhere, because they don’t have the same material near them and move it from far might be economically prohibitive and not competitive with local material. Also your house design is probably a poor fit for many environments. And people have different cultural expectation in term of architecture due to historical factors, plus each individual comes with its own whims.
Where is the standard scalable (both shrink and expand) omni-environment bridge design, hmm? All bridges have the same basic purpose, and yet I have no doubt building most of them have been a challenging endeavor of its own.
I'd say that's just a function of how much learning you have to do while doing. If you have all the information required to do something, the the time needed probably approaches a normal distribution. But any unknown unknowns that you encounter are vastly more likely to delay than expedite the project, making the right-side tail both fatter and longer.
>Engineering mode shuns uncertainty, because uncertainty may involve risk that corresponds to bad surprises. Discovery mode thrives under uncertainty, especially when a rare but beneficial result leads to finding something new, or a reduction of uncertainty in the face of making strategic decisions.
In summary, to understand the distinctions of Discovery and Engineering modes, one needs to have an appreciation for variation and the underlying distribution of outcomes expected while operating in each mode respectively. Without understanding the asymmetry in their outcome distributions, it would be difficult to convey how these work modes are different.
If learning durations were log-normally distributed, how would people be able to graduate from universities and finish studies, mostly in time? Or accomplish anything substantial in their limited time span?
I agree that distribution of most human tasks' duration is skewed (not necessarily distributed log-normally), but these tasks can still have a reasonable upper bound for completion. The success is not binary. Like in grading, we need to accept that some projects will get an A, and some will get only B or C, and it's still OK. Some may fail.
Exactly. To see why notice that in a curriculum you are presented with both a problem and a solution. You are encouraged to find your own solutions to many problems, but regardless of whether you do you are also presented with the correct (optimal) solutions. This removes inaccuracies in your thinking, which would otherwise pile up multiplicatively, yielding a log-normal distribution of the time needed to master some topic.
> If learning durations were log-normally distributed, how would people be able to graduate from universities and finish studies, mostly in time?
If you took a hundred 5 year olds and set the objective to achieve the same PhD in the same field with unlimited time, guess how the time to achieve it would be distributed?
I'm sure some won't achieve it in their lifetime, so I disagree that there's a reasonable upper bound.
When simulating or modeling how long something will take, (0 hours, 1 hours, 2 hours...) you never use a normal distribution because of the potential for negative values in the left tail. You would rule that out by default. The title itself implies that the author has this misconception.
Pilot Plants and Scaling Up Chemical engineers learned long ago that a process that works in the laboratory cannot be implemented in a factory in only one step. An intermediate step called the pilot plant is necessary to give experience in scaling quantities up and in operating in nonprotective environments. For example, a laboratory process for desalting water will be tested in a pilot plant of 10,000 gallon/day capacity before being used for a 2,000,000 gallon/day community water system.
Programming system builders have also been exposed to this lesson, but it seems to have not yet been learned. Project after project designs a set of algorithms and then plunges into construction of customer-deliverable software on a schedule that demands delivery of the first thing built.
In most projects, the first system built is barely usable. It may be too slow, too big, awkward to use, or all three. There is no alternative but to start again, smarting but smarter, and build a redesigned version in which these problems are solved. The discard and redesign may be done in one lump, or it may be done piece-by-piece. But all large-system experience shows that it will be done.[ Where a new system concept or new technology is used, one has to build a system to throw away, for even the best planning is not so omniscient as to get it right the first time.
The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. The only question is whether to plan in advance to build a throwaway, or to promise to deliver the throwaway to customers. Seen this way, the answer is much clearer. Delivering that throwaway to customers buys time, but it does so only at the cost of agony for the user, distraction for the builders while they do the redesign, and a bad reputation for the product that the best redesign will find hard to live down.
Fred Brooks as usual stands the test of time. It's incredible how many times I've quoted some part of Mythical Man-Month in my career, it's also part of the recommended literature I give to every junior I've trained in my life.
I've noticed lately that it's become less common someone in the room has read it when I mention a passage from it, be it management or other software devs. Not sure why but it used to be much more common that I'd bring the argument against a Second-System and at least one other person would recognise it, or the issues with communication channels between a larger group.
For any professional software developer out there who haven't read it, please do, it's a collection of wisdom from 50 years ago that still applies today.
It's always amusing to read Fred Brooks' own lamentations.
He's flattered that people still quote him, but also horrified that what he thought was his first, fledgling start in software project management is still quoted as start of the art. We haven't learned anything.
Brooks lamented that we've learned so little, that his observations stay relevant, instead of being superseded by something more systematic.
In contrast, if you were learning how to write a compiler or operating system today, you (hopefully!) wouldn't pick up a book published when the Mythical Man Month came out.
> Fred Brooks as usual stands the test of time.[...]I'd bring the argument against a Second-System
I'm confused. The GP appears to be saying that one should plan to build two systems - one as a proof-of-concept, and another (which is actually presented to most customers) in which one does things "the right way" based on learnings during that process - whereas you (and Fred[0]) seem to allude that the second system tends to be bloated and over-engineered. What am I missing?
Second-system effect is the replacement of a full production application with another full blown production application where the second-system has to replicate all (or at least most) of the features from the first-system into the second one while also adding new features into the second-system. Even worse are the cases where the first-system still has to coexist and keep being developed while the replacement second-system is in development, that usually leads to a never-ending catch up of the second-system trying to integrate all the changes from the first-system that happened after the start of the second-system project.
Hadn’t thought of it in this context before but in game development a “vertical slice” could potentially be thought of as akin to the “pilot plant” in that it’s about proving out the _processes_ for creating game content (and working out how long each piece of content should take to make) before committing to “scaling up” said content production to deliver the full game…
Game development is intensely iterative. One way to think of games is software where its only value is its usability. It has has no process workflows or business goals that it ever sacrifices usability for because the user experience is the value proposition itself. It only sacrifices the user experience for other aspects of the user experience.
Failed and not fun games do happen, but in general game developers put far more weight to this sort of process. There are no other goals that you can claim were accomplished if the users think the game sucks. So you get these ruthless production cycles and there's an appetite to cut things that don't work.
It is an interesting theory and a nice story; but I don't believe it. One of the key issues in programming systems is the profound uncertainty about what the system is meant to be doing. In chemical engineering you know exactly what is supposed to be built and have a clear idea of how the major reactions should work and what the tolerances are.
Both situations require design around uncertainty, but the chemical engineer is looking for practical kinks in a roughly-understood system. The programmatic system has no idea what is value add, what will stay or go or anything like that. In the extreme case you get things like Slack or Discord, where the plan wasn't even trying to build communications software to start with.
The nature and scale of the uncertainty is different. Software needs to adopt a fail-fast iterative approach with lots of feedback. Chemical engineers are looking for something else. The "pilot plant" concept wouldn't carry to software. Although the "we're going to throw this out" attitude works wonders (unless you rely on it happening, in which case that code is permanent now for some reason).
When building a plant you can only build the same plant once. When building software you can keep iterating on the same software forever. How do you know when to start over and when to keep going? There seems to be no consensus on that?
This would be true if the probabilities were independent [1], which they are probably not, as the infra is tied to the app.
The probability can be anything from 40% to 70%. You can see that if you arrange 3 rectangles inside a unit square, one with 90% of the size, one with 80%, one with 70%. If they contain each other, their overlap is 70% of the whole, namely the smallest rectangle. In the worst case it is 40%, when the 80% is 10% out of 90%, giving 70% overlap. And then 70% is 30% out of that overlap, leaving 40%.
[1] https://en.m.wikipedia.org/wiki/Independence_(probability_th...