Hacker News new | past | comments | ask | show | jobs | submit login
Deming's Red Bead Experiment (2002) (maaw.info)
142 points by CoffeeOnWrite on Feb 4, 2022 | hide | past | favorite | 64 comments



Years ago I found a discussion of this on the web that involved a deep dive into optimal strategies for getting white beads, variations in paddle construction, root cause analysis on bead size and weight and hole depth in the paddles, and so on. It was a six sigma nightmare come to life and missed the point so profoundly I wish I could find it again to use as an example of how easily Deming is misunderstood.

Related: "A bad system beats a good person any time" does not mean "having any system, no matter how bad, is better than having even the best people and no apparent system".


Oh boy, I'd love to see that too. Unfortunately it's all too common to see this stuff in reality as well.

> Related: "A bad system beats a good person any time" does not mean "having any system, no matter how bad, is better than having even the best people and no apparent system".

I'm a big fan of sociotechnical systems [0] where the motto is to give people complex jobs in simple organizations. Unfortunately in practice you usually see the tendency to do exactly the opposite.

[0] https://en.wikipedia.org/wiki/Sociotechnical_system


“beats” in the sense of “the beatings will continue until morale improves,” right?


It certainly isn't positive.


Wait a bit and HN will probably provide you a similar discussion.


Came here expecting optimal solution to be top comment.


The key to doing this experiment well is having the right facilitator that brings the attitude. A good facilitator will roleplay a leader/manager/exec that will praise when measures are good and berate when measures are bad. The idea of this experiment is to show how management can harm the process even when there is inherent variability, good or bad.

Here is Dr. Deming himself performing the experiment https://www.youtube.com/watch?v=7pXu0qxtWPg


What an amazing demonstration. It’s unfortunate that these lessons haven’t been learned decades later.


Thanks for sharing this, it's amazing to see it in action.


If you are in any sort of leadership position -- either a formal manager, or in that you have respect from your peers, I strongly urge you to read Deming.

There are few authors that have taught me so much about people, motivation, systems, quality, statistics, what high-leverage effort looks like, and so on.

I first picked up a book by Deming a few years ago, and not a single day has passed that I have not had use for what he taught me through his writing.

The things he says are only becoming more and more relevant with every year. I honestly think it ought to be compulsory reading in school. The world would be a much better place that way; kinder, more efficient, and less superstitial.


Thanks for the suggestion, how does what he says stack up against those who came after, I ask this because this is one of my favourite management talks by Russell Ackoff [1] and he mentions Dr Deming so assume he was influenced by/worked/studied with him and given their relative ages wondered if his work might be valuable to start with?

[1] https://www.youtube.com/watch?v=OqEeIG8aPPk


Many of Deming's findings and ideas now seem so obvious that it's hard to believe their origins are so recent. I'd bet that much of what you think about management and organizations comes from Deming.


Which book?


I started with The New Economics, then read Out of the Crisis, and finally A Theory of Sampling or whatever its name is.


Thanks.


I mentioned somewhere downthread about how the lessons of Deming allow me to operate more efficiently.

Here's a concrete example from today: under my supervision is an extremely good, but also somewhat expensive contractor.

My predecessor hinted that upper management has started to become nervous that the costs will run away, and recommended that I try to control that cost by having everyone in the company ask for my permission before they used this contractor's time. I'm not a fan of blocking other people's work on my approval. Besides, it's not like I understand the nuances of each situation well enough to make a good decision.

So instead, I asked for the historic bills from that contractor and plotted them on an XmR control chart. Sure enough -- every single one in statistical control.

It's a stable system. There's no sign of increasing costs. NNo unusual amounts billed. Barring special causes, I can predict the future costs of this contractor perfectly, with zero intervention -- just from the data.

Now maybe this stable system results in too much expense, and that's a conversation about common causes worth having. But I see no reason to meddle with individual decisions. It can only make the variability worse.


The above article explains the experiment but I didn't get the point of it until reading this: https://blogs.mtu.edu/improvement/2015/06/18/the-red-bead-ex...

It seems to be that is a demonstration of the following:

1) Take a task that can only be minimally affected by skill or effort (drawing random beads)

2) Pretend it's a task that can be affected by skill or effort, leading to natural interventions like worker incentives, praise, etc.

And together you get that 2) doesn't affect 1) at all. And maybe people feel bad about it afterwards. This illustrates the point that you can't just use worker incentives to optimize the system, you might need to change the system (e.g. suggest new tools). (Please correct me if this interpretation is way off).

I can see why this is an important point: many managers do think it's all about employee skill and effort, and don't look at the system.

But here is my criticism: I would say it's not an experiment, but instead a demonstration or even illustration. I (and many people) can easily predict the outcome of this illustration if it was just verbally described.

Also, 2) not working in situation 1) dangerously should not be read to mean 2) will never work. There can be many situations where the opposite of 1) applies: where the only thing that matters is worker skill and effort, like moving bricks from one pile to another without tools (presuming that is a necessary and irreducible task). In that case 2) can be the largest lever possible.


> There can be many situations where the opposite of 1) applies: where the only thing that matters is worker skill and effort, like moving bricks from one pile to another without tools (presuming that is a necessary and irreducible task). In that case 2) can be the largest lever possible.

You're absolutely right here, of course. However, I'd argue that the "necessary and irreducible task without tools" constraint is much stricter than we see in practise anywhere in real life.

In my experience, even mundane tasks like these (I grew up on a pseudo-farm for some early years of my life, so I have seen plenty of variations of the moving-bricks task) can be optimised to the point of near elimination if performed in a system that encourages thinking.

When comparing the two neighbouring pseudo-farms where I grew up, you could easily see one operated with much higher efficiency than the other even when they had roughly the same moving-brick-like tasks to perform.


Sure -- that's fair enough, very few real world tasks are actually irreducible.

(When I first wrote my comment, I wrote just "moving bricks" but then realized that Deming was right after all: you could move bricks with tools =).)


Well, let's apply it to tech. I know lots of people getting praised for their work and others not, but they didn't get to choose their work, it was assigned. So, some got sexy/glamorous tasks, some got maintenance tasks, some ran into very hard to diagnose existing bugs that took weeks to solve while others didn't so their velocity looks higher. Others got assigned tasks that appeared simple but had rippling effects that made the task 10x larger than it appear. this meant it looked like they accomplished less since the only thing they're being judged on is did they finish this original task that was perceived as simple.

And then, managers give praise or encouragement to do better, all based on those issues.

Note: I'm not saying some employees don't have more skills than others, only that the things out of an employee's control seem similar to the Deming red bead experiment.


The other well-known experiment Deming used is the funnel: https://www.2uo.de/deming/#the-funnel-experiment


A related video from around 1994: http://www.youtube.com/watch?v=C5Io2WweTxQ

(via https://news.ycombinator.com/item?id=5193898, but no comments there)


If anyone wants to know more more about Deming, I can warmly recommend this blog post by Avery Pennarun: https://apenwarr.ca/log/20161226


I'd be careful about taking that particular author's interpretation of management literature at face value.

His understanding of High Output Management (a seminal book by the CEO of Intel) was so flawed that the CEO of Dropbox had to correct him.

https://news.ycombinator.com/item?id=21088425


Drew, fwiw, pointed out that Avery Pennarun (who is, frankly, phenomenal at distilling ideas in a given context) was right about a bunch of things except the TLDR.

Though Avery does hint that "output" itself is a function of values/principles execs ought to imbibe in their org:

> What executives need to do is come up with organizational values that indirectly result in the strategy they want.

> That is, if your company makes widgets and one of your values is customer satisfaction, you will probably end up with better widgets of the right sort for your existing customers. If one of your values is to be environmentally friendly, your widget factories will probably pollute less but cost more. If one of your values is to make the tools that run faster and smoother, your employees will probably make less bloatware and you'll probably hire different employees than if your values are to scale fast and capture the most customers in the shortest time.

It remains to be seen if Avery ends up building a larger company than Drew. I'm willing to bet all of $100 in my depleting bank account that they will.


Drew Houston disagrees with you and Avery. Not just the TL;DR but in the critical details.

> Contrary to what the post suggests, HOM does not say not that the job of an executive is to wave some kind of magic culture or "values" wand and rubber-stamp whatever emergent strategy and behavior results from that. CEOs and executives absolutely do (and must) make important decisions of all kinds, break ties, and set general direction.

Notwithstanding that Drew is a billionaire who founded a billion dollar company and Tailscale has yet to crack a large valuation, Avery basically misinterpreted Andy Groves. That misinterpretation, "CEO as a passive referee whose job is to set the culture", is only sensible to people who've never managed a large group of people.


May be. I am in no position to argue with Drew or Avery for that matter. (:


I wonder what the limits of this are? From a naive point of view there has to be a point where training/skill/physical endurance/etc. come into play. The bed experiment seems to fit a fixed rate, assembly line style of work. While I would agree that numeric/performance ranking is mostly meaningless, everyone knows that one somebody you go to when no one else can fix a problem.


As you have observed already, this experiment is set up specifically to eliminate the effect of training/skill/physical endurance etc, and YET when it's performed in real life with a good facilitator, people who are unlucky start to feel like they're underperforming and need to step it up, while people who are lucky start to feel like they deserve the praise for doing well.

I've read about people who go for days after the experiment and feel bad about their subpar performance because they feel like they've let down or brought shame to their company and wonder if they couldn't have done something better.

And this is an experiment that's set up to remove any trace indivdual agency what so ever! People still beat themselves up over it.

When you experience this experiment for real, you start to forget that it's actually designed to eliminate any sort of skill.

In other words, the experiment shows how hard it is to recognise when we're judging the system and not the people in it. The experiment shows that even when you think you're seeing individual performance, it's very plausible you're not.


I see what you mean, but I also think that’s encapsulated in the idea of “ready willing workers.”

Obviously there are differences between people, and better and worse teams. But the lesson here is about how the environment factors in, and how management can accidentally arbitrarily suppress innovation or reward luck within normal bounds of success. Or hamper themselves to failure by insisting on a broken process.

Could it be the case that “everybody goes to Jim,” and as a result, Jim gets good at helping people? Could it be that if everybody just went to Kim for 2 weeks, that her fixes might turn out to be better yet completely orthogonal method of solving the problem?

The Red Bean experiment is an antidote to rigid process and the praise/blame game as based on inspection of results. It’s a story intended for management to hear, not an absolution or dismissiveness of personal reasonability.

If you’ve hired “ready willing workers,” then looking at the results doesn’t necessarily show you who was killing it and who wasn’t.

That worker who is always “killing it” may be good at scooping up projects that always look great. That worker who is always underperforming might be maintaining essential infrastructure without which the system would fall apart.

The worker who’s killing it may be doing so by spending all their time “buttering up” a customer. The worker who appears underperforming may appear so because they spend all their time “buttering up” a customer, but someone else always lands the sale.

It’s a meditation on imperfect knowledge.


Focusing on the type of work being done is a bit of a bike shed, since the experiment isn't about the work per se, but the measurement of the work as a function of the employee alone - ie, without the context of the systems in which the employee functions.

A good example of the type of mismeasurement done in non-manufacturing contexts is the ridiculously stupid burn-down chart.


> A good example of the type of mismeasurement done in non-manufacturing contexts is the ridiculously stupid burn-down chart.

Bad management can find a misuse for any tool, I don't think burn-down charts are a particularly attractive nuisance in that regard.


I saw a link to this in a discussion of another topic, I'm glad somebody pushed it to the top level. Definitely worth the read.


Deming made me realize that there is actually management literature out there that isn't just fads and slogans.


Would there be any other similar recommendations to Deming's books? I would think that Eliyahu M. Goldratt's books sound similar (specifically, the "Theory of Constraints," alternatively presented through a fictional story in "The Goal").


I'm a fan of Womack & Jones's "Lean Thinking". This is all about Lean manufacturing, which I think is partly rooted in Deming's work. The focus is more on how to optimise the overall system than the management of individuals.

The content of that book isn't directly applicable to e.g. software companies but if you think a bit you can see quite a lot of analogous situations (e.g. warehoused inventory is incomplete projects or not-yet-shipped code, "monuments" could be inappropriate central test / build systems, etc).


If you want someone to translate it to software for you, Reinertsen's Principles of Product Development Flow is about adapting the philosophy to knowledge work.

Ward's Lean Product and Process Development is also a good take on those ideas.


Thanks - I will aim to check them out at some point. Lean Thinking made a big difference to how I approached engineering processes and I'm keen to see what else is out there.


I recommend Russell Ackoff's writings as somewhat related and more to do with how systems of people and processes work (or don't). Here's a great place to start: https://thesystemsthinker.com/a-lifetime-of-systems-thinking...


I don't get it. This "experiment" could have been replicated by a simple computer simulation, given that worker output is entirely random. The supposed moral of the story is that system design defines outcome, not individual performance but how does that even count as "science" when you don't have control and experimental group. He designed a system with inherent flaws and, surprise, it has flaws. We can see there is variance in "productivity" but we have no idea how this same variance would have affected output if workers actually had agency.


That's the point.

So first, not all science requires an RCT. Dividing expiremental subjects into study and control groups is one way of doing science. It's not the only way.

In this case, this is a concrete demonstration of just how much variance can emerge from a "statistically neutral" process. The systemic flaws are part of the demonstration. What appear at first glance to be identical tools, inputs, and processes are in fact subtly different. The demo shows management types that their charts and graphs cannot always be relied upon to differentiate performance levels among staff. The system itself must also be scrutinized. If Bob's ad campaigns are outperforming Alice's by 20% in the first quarter, it doesn't necessarily mean Bob is a marketing genius and Alice needs a PIP.

A computer simulation would not have nearly as powerful effect on most people as a live demonstration using real beads. And the imperfections in the paddles is something that naturally arises when they're physically made, but would have to be tuned by the programmer building the simulation. Which would lead to questions about "just how did they decide what variances would come into play?"


I think I get it now. The point of the experiment is to ELI5 the concept of variance to management types who skipped statistics classes :) Could be useful for some bosses I had :)


So I'm watching a video of this right now[0], and it's even more enlightening than I figured it would be! Deming makes comments throughout the demonstration that I swear I've heard in the real world. For example, one worker—whose previous results put him on probation (he had 12 red beads)—managed to have only 6 the next day. "Looks like probation worked".

Meanwhile another worker—previously scoring 5, and getting a merit-based raise from it—did poorly with 12. The remark: "That raise went to his head. He's getting lazy".

So yeah, the value of this is in the actual doing of it.

[0] https://www.youtube.com/watch?v=7pXu0qxtWPg


Not just variance but also the concept of essential complexity: One cannot possibly hope to control all the variables, and so one must strive to continually seek to improve the system ("monitoring/alerting") as defects emerge ("hidden factories"), conduct experiments to evaluate predictions against actual outcomes ("a/b tests"), to not erroneously change systems and processes to fix "special-causes" [but understand that most causes of quality degradation are "commons-causes", which has to be fixed by changing systems and processes ("root-cause analysis")], to not drive down costs for the sake of it but rather seek to drive down long-term costs ("ecosystem"), to have a more humanist approach to management... among a host of other concepts.

If you are curious (and have the stomach for some MBA-ness), this article (four parts) should be well worth your time: https://archive.is/tXJhw


A real world example is that I do programming on a high-end workstation laptop. My coworkers use old budget laptops.

This is not in their control — they’re victims of corporate policy.

Does it influence quality in complex and hard to quantify ways?

Most assuredly…


This experiment is something he did in classes etc so people could _experience_ the obvious idiocy in trying to manage individuals for system behaviour. His point is that ALL actual work is also dominated by system behaviour, just more subtly, and managers must focus on the systems and not worker performance.


> His point is that ALL actual work is also *dominated* by system behaviour

If that's his point, it seems obviously wrong. Some work, like in the experiment is dominated by system behavior. For others, system would play a much smaller role. For example, 2 people cranking code in a startup. No matter what system you apply, if they are not good programmers, nothing of value will come out.


Something will most certainly come out. You're just not defining the system. If I define the system as "one programmer is responsible for looking at the desired product and writing specifications" and the other programmer is to translate specification into programming code, and never shall one do the other's job, I assure you, the best programmers in the world will produce shit over time.

Randomly swap in two new actors with different life experiences into the same spots to do the same work, and you'll still get shit. If in the unlikely event, you get amazing work, it's not that the people doing it were special; it's just anothe outlier in the data stream. Add in the emotional toll of working as hard as possible to succeed but never being able to meet prescribed quality levels?

A system is perfectly tuned to produce the results it does. Want different results? Change the system. That is Deming's point. We have a tendency to blame variance in a system on the human actors immediately proximal, instead of paying attention to the actual significant constraints. This is an important lesson to management types, as they are to process/system what a programmer is to a computer.

The planners cast the dice for downstream long before downstream can do anything about it, and in many corporate setups, top down works just fine, but bottom up never gets any attention.


That's also missing the point somewhat. Put the best two programmers in the world in a shitty system that rewards them for the wrong things and they will produce garbage. Put mediocre programmers in a fantastic system that brings out the best in their collaboration and you might actually get to market sooner and better than the other group.


To extend your example: make those two programmers maintain 100% unit test coverage, document all decisions in Jira/Confluence, raise change requests and issue change freezes for stability, and you'll have enterprise velocity and stability. Allow them to hack away with little process and you'll have startup velocity and stability. Systems exist at all levels of organisation, just at smaller organisations that system is usually more implicit.


It's a demo experiment, like whe the physics teacher swinga a heavy pendulum at their own nose, or shoots a BB gun at a falling toy.


The W. Edwards Deming Institute Blog https://deming.org/blog/

Deming on various management topics https://deming.org/category/deming-on-management/

More resources on Deming's ideas https://deming.org/online-resources-on-w-edwards-demings-man...


There's a brilliant little book Four Days with Dr. Deming[0] that goes over the red bead experiment among other things. It basically follows the format of a four day seminar that Dr. Deming used to do. It's full of wisdom like this and it does a painfully good job making you recognize all the ineffective things still going on in companies today.

[0] https://www.goodreads.com/en/book/show/34987.Four_Days_with_...


My takeaway, and the one I strive to teach my kids, from this experiment is to do everything in my power to not wind up in a job like the one defined here.

Find your strengths, find something you enjoy that utilizes those strengths, and find a career where you can stand out for the combination.


I find the experiment skewed. Or more precisely, that it is not meant to investigate human behaviour or psychology. It is rather precisely designed to support a chosen result to support a given world view. The fact that it has been ran for 50 years is a strong indication of this.

IOW, the experimenter wanted to be able to arrive at the conclusion that difference in performance was unrelated to workers and designed the experiment so it would give this result. In short, this demonstrate few things outside of a very artificially setup situation, where the workers have no say and the job is predestined to fail.

Anyone who worked anywhere knows very well that there are actually vast difference between two workers.


>IOW, the experimenter wanted to be able to arrive at the conclusion that difference in performance was unrelated to workers and designed the experiment so it would give this result.

That's the whole point. The experiment is not that we're supposed to be surprised that the workers did not affect performance - in fact, that's the subtext of the whole thing! We know it from the start cause he explains exactly how the process works and we can all see that individuals cannot affect their output.

The point is, if we are unaware that we're in such a situation, we can still find metrics to allow us to rank workers, fire low performers, give out raises, etc. When we myopically focus on such metrics, and disregard the system that makes them worthless, we're making all our decisions on random chance, even though we have a clear process, data collection, the whole thing.


That's also my whole point: this is not an experiment but an elaborate artificial argument designed to prove a point of view decided in advance. That is why I find it unsavory.


The point of the experiment is to be extreme, but after reading (a very large portion of) Deming's work, I don't think he'd disagree with your initial assertion that there are differences between workers.

The broader points he makes, related to this experiment at least: There are individual and systemic issues that influence the outcome of a process. The actual ratio will vary depending on what kinds of processes are involved.

If the job is to be a literal screw turner on an assembly line, then there is relatively little difference between the majority of people (assuming they are generally able bodied, sighted, and have decent coordination), the system (tempo, length of shift, accessibility of the thing being screwed together, tools being used) will have a much larger impact than the individual's skill. The system of the assembly line will influence the outcome more than the individual's skill (at least above a basic threshold, a supremely uncoordinated individual could flounder even with the slowest pace of work). Switch to more skilled work and you will find, increasingly, more differences in outcome based on individual performance versus the system of the work, but even there the system matters.

Look at software development offices that still favor things like manual build processes, code versioning control, testing, and deployment over automation. They provide many opportunities for human error (even just simple miskeying of data) that can reduce everyone's effectiveness no matter how skilled. (Fortunately these kinds of places are increasingly rare, at least outside of US defense contractors.)

The experiment, then, is an artificial construct (like most classroom experiments) meant to illustrate a point by showing one extreme. This acts as a counterpoint to the more conventional wisdom that the individual, and not the system, is what actually matters for the outcome. The conventional wisdom, of course, being wrong in many circumstances since it tends to place too strong a weight on the individual performance and too weak a weight on the system.

It would be unsavory if he had said, "See, stop evaluating individuals their contribution doesn't matter." But he never did say that (in anything I read, at least), and anyone who looks at this experiment and draws that conclusion would be an idiot.


you find effective demonstrations of viewpoints unsavory?


I was lucky enough to do this with the man himself at NYU. He had trouble speaking then but the class was dead silent and hung on his every word. Profound thinker.


Interesting experiment. I don't think this applies to knowledge work in the same way it does to manufacturing.


Software engineer is put on a team that has more legacy code. If management judges by # of incidents, they are under performing.

Heck I can tell you from experience that if you want to get promoted fast, new product teams are the way to go. You get to file lots of patents, architect huge new systems, and look like a rock star.

Another example: Partner teams upstream keep pushing breaking API changes, downstream teams look bad because their services are the ones having the outage. You do your due diligence, your code is defect free, well tested. Doesn't matter, you are spending half your day putting out fires caused by someone else. Meanwhile another co-worker starts on a team where their upstream services are written to be robust against bad incoming data and have APIs that maintain back compat. Your co-worker puts out buggy poorly tested code, but the upstream services are robust enough that everything keeps chugging along.

Management doesn't see any of this. They just see your team has poor performance, and this other team has great performance. Heck maybe that other team has a higher "velocity" because they can turn out features faster.


You're right. It doesn't. But it's a matter of degree, not kind. Knowledge work has many times the variability of manufacturing (by design: if you remove variability from knowledge work, you're no longer producing anything new each time.)

In other words, this applies even more severely to knowledge work.

More concretely: in manufacturing you can have a process that yields 9–14 % defective or whatever. The variation is relatively small; say a CV of 10 %. In knowledge work, you'll be looking at processes that generate somewhere between 0.1 and 100 defective ideas for every really good idea. This variation is enormous: 1000 % or so.


In knowledge work the skill of the worker matters vastly more because "the process" mostly takes place in their head. You can optimize a workspace and tools for productivity and the reduction of errors but ultimately that has a minimal impact on "the process" that's taking place in the knowledge worker's mind.

If the process was the problem (or perfect), adding a new (or replacing) a worker would have minimal impact but we know this is not true. You could have the best documentation, training, and absolutely stellar code yet one person can turn everything to shit quite quickly! The opposite is true as well: Bringing on a fantastic new worker can make your existing team look like a bunch of inefficient laggards.

Neither of these situations can be fixed by improving processes (maybe hiring processes? Though I doubt it). It'd be like having one magic blue bean in the box that--if found--can either drastically improve or degrade the final productivity by 90%. Would the optimum process improvement then be to try to eliminate magic beans entirely? Sure seems like it (i.e. hire the lowest common denominator and don't try to optimize for the 1%). That way you reduce the likelihood of taking on the "bad 1%"--even though it reduces your chances of obtaining the perfect magic bean.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: