Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could you elaborate or link something here? I think about this pretty frequently, so would love to read something!


Metric: time to run 100m

Context: track athlete

Does it cease to be a good metric? No. After this you can likely come up with many examples of target metrics which never turn bad.


If it were a good metric there wouldn't be a few phone books worth of regulations on what you can do before and during running 100 meters. From banning rocket shoes, to steroids, to robot legs the 100 meter run is a perfect example of a terrible metric both intrinsically as a measure of running speed and extrinsically as a measure of fitness.


> Metric: time to run 100m

> Context: track athlete

> Does it cease to be a good metric? No.

What do you mean? People start doping or showing up with creatively designed shoes and you need to layer on a complicated system to decide if that's cheating, but some of the methods are harder to detect and then some people cheat anyway, or you ban steroids or stimulants but allow them if they're by prescription to treat an unrelated medical condition and then people start getting prescriptions under false pretexts in order to get better times. Or worse, someone notices that the competition can't set a good time with a broken leg.


So what is your argument, that it doesn't apply everywhere therefore it applies nowhere?

You're misunderstanding the root cause. Your example works as the the metric is well aligned. I'm sure you can also think of many examples where the metric is not well aligned and maximizing it becomes harmful. How do you think we ended up with clickbait titles? Why was everyone so focused on clicks? Let's think about engagement metrics. Is that what we really want to measure? Do we have no preference over users being happy vs users being angry or sad? Or are those things much harder to measure, if not impossible to, and thus we focus on our proxies instead? So what happens when someone doesn't realize it is a proxy and becomes hyper fixated on it? What happens if someone does realize it is a proxy but is rewarded via the metric so they don't really care?

Your example works in the simple case, but a lot of things look trivial when you only approach them from a first order approximation. You left out all the hard stuff. It's kinda like...

Edit: Looks like some people are bringing up metric limits that I couldn't come up with. Thanks!


> So what is your argument, that it doesn't apply everywhere therefore it applies nowhere?

I never said that. Someone said the law collapses, someone asked for a link, I gave an example to prove it does break down in some cases at least, but many cases once you think more about it. I never said all cases.

If it works sometimes and not others, it's not a law. It's just an observation of something that can happen or not.


  > I never said all cases.
You're right. My bad. I inferred that through the context of the conversation.

  > If it works sometimes and not others, it's not a law.
I think you are misreading and that is likely what lead to the aforementioned misunderstanding. You're right that it isn't a scientific law, but the term "law" gets thrown around a lot in a more colloquial manner. Unfortunately words are overloaded and have multiple meanings. We do the same thing to "hypothesis", "paradox", and lots of other things. I hope this clarifies the context. (even many of the physics laws aren't as strong as you might think)

But there are many "laws" used in the same form. They're eponymous laws[0], not scientific ones. Read "adage". You'll also find that word used in the opening sentence on the Wiki article I linked as well as most (if not all) of them in [0]

[0] https://en.wikipedia.org/wiki/List_of_eponymous_laws


it doesn't break down - see comments about rules above. it was the perfect example to prove yourself wrong.


I disagree with all of those examples, they are misunderstanding what it means for the metric to break down in the context of the law, but alas. "If you run a different race" lol.


  > in the context of the law
That's the key part. The metric has context, right?

And that's where Goodhart's "Law" comes in. A metric has no meaning without context. This is why metrics need to be interpreted. They need to be evaluated in context. Sometimes this context is explicit but other times it is implicit. Often people will hack the metric as the implicit rule is not explicit and well that's usually a quick way to make those rules explicit.

Here's another way to think about it: no rule can be so perfectly written that it has no exceptions.


could you explain what you think the difference is?

a metric is chosen, people start to game the system by doing things that make the metric improve but the original intent is lost. increasingly specific rules/laws have to be made up to make the metric appear to work, but it becomes a lost cause as more and more creative ways are found to work around the rules.


Exactly, that's the definition. It doesn't apply to timing a 100m race. There's many such situations that are simple enough and with perfect information available where this doesn’t break down and a metric is just a metric and it works great.

Which is not to the detriment of the observation being true in other contexts, all I did was provide a counter example. But the example requires the metric AND the context.


Do you know certain shoes are banned in running competitions?

There's a really fine line here. We make shoes to help us run faster and keep our feet safe, right? Those two are directly related, as we can't run very fast if our feet are injured. But how far can this be taken? You can make shoes that dramatically reduce the impact when the foot strikes the ground, which reduces stress on the foot and legs. But that might take away running energy, which adds stresses and strains to the muscles and ligaments. So you modify your material to put energy back into the person's motion. This all makes running safer. But it also makes the runner faster.

Does that example hack the metric? You might say yes but I'm certain someone will disagree with you. There's always things like this where they get hairy when you get down to the details. Context isn't perfectly defined and things aren't trivial to understand. Hell, that's why we use pedantic programming languages in the first place, because we're dealing with machines that have to operate void of context[0]. Even dealing with humans is hard because there's multiple ways to interpret anything. Natural language isn't pedantic enough for perfect interpretation.

[0] https://www.youtube.com/watch?v=FN2RM-CHkuI


it wasn't a very good counter example.


> Does it cease to be a good metric?

Yes if you run anything other than the 100m


Do you have an example that doesn't involve an objective metric? Of course objective metrics won't turn bad. They're more measurements than metrics, really.


  > an objective metric
I'd like to push back on this a little, because I think it's important to understanding why Goodhart's Law shows up so frequently.

*There are no /objective/ metrics*, only proxies.

You can't measure a meter directly, you have to use a proxy like a tape measure. Similarly you can't measure time directly, you have to use a stop watch. In a normal conversation I wouldn't be nitpicking like this because those proxies are so well aligned with our intended measures and the lack of precision is generally inconsequential. But once you start measuring anything with precision you cannot ignore the fact that you're limited to proxies.

The difference of when we get more abstract in our goals is not too dissimilar. Our measuring tools are just really imprecise. So we have to take great care to understand the meaning of our metrics and their limits, just like we would if we were doing high precision measurements with something more "mundane" like distance.

I think this is something most people don't have to contend with because frankly, very few people do high precision work. And unfortunately we often use algorithms as black boxes. But the more complex a subject is the more important an expert is. It looks like they are just throwing data into a black box and reading the answer, but that's just a naive interpretation.


This isn't what Goodhart's law is about.

Sure, if you get a ruler from the store it might be off by a fraction of a percent in a way that usually doesn't matter and occasionally does, but even if you could measure distance exactly that doesn't get you out of it.

Because what Goodhart's law is really about is bureaucratic cleavage. People care about lots of diverging and overlapping things, but bureaucratic rules don't. As soon as you make something a target, you've created the incentive to make that number go up at the expense of all the other things you're not targeting but still care about.

You can take something which is clearly what you actually want. Suppose you're commissioning a spaceship to take you to Alpha Centauri and then it's important that it go fast because otherwise it'll take too long. We don't even need to get into exactly how fast it needs to go or how to measure a meter or anything like that, we can just say that going fast is a target. And it's a valid target; it actually needs to do that.

Which leaves you already in trouble. If your organization solicits bids for the spaceship and that's the only target, you better not accept one before you notice that you also need things like "has the ability to carry occupants" and "doesn't kill the occupants" and "doesn't cost 999 trillion dollars" or else those are all on the chopping block in the interest of going fast.

So you add those things as targets too and then people come up with new and fascinating ways to meet them by sacrificing other things you wanted but didn't require.

What's really happening here is that if you set targets and then require someone else to meet them, they will meet the targets in ways that you will not like. It's the principal-agent problem. The only real way out of it is for principals to be their own agents, which is exactly the thing a bureaucracy isn't.


I agree with you, in a way.

I've just taken another step to understand the philosophy of those bureaucrats. Clearly they have some logic, right? So we have to understand why they think they can organize and regulate from the spreadsheet. Ultimately it comes down to a belief that the measurements (or numbers) are "good enough" and that they have a good understanding of how to interpret them. Which with many bureaucracies that is the belief that no interpretation is needed. But we also see that behavior with armchair experts who try to use data to evidence their conclusion rather than interpret data and conclude from that interpretation.

Goodhart had focused on the incentive structure of the rule, but that does not tell us how this all happens and why the rule is so persistent. I think you're absolutely right that there is a problem with agents, and it's no surprise that when many introduce the concept of "reward hacking" that they reference Goodhart's Law. Yes, humans can typically see beyond the metric and infer the intended outcome, but ignore this because they don't care and so fixate on the measurement because that gives them the reward. Bureaucracies no doubt amplify this behavior as they are well known to be soul crushing.

But we should also be asking ourselves if the same effect can apply in settings where we have the best of intentions and all the agents are acting in good faith and trying to interpret the measure instead of just game it. The answer is yes. Idk, call it Godelski's Corollary if you want (I wouldn't), but it this relates to Goodhart's Law at a fundamental level. You can still have metric hacking even when agents aren't aware or even intending to do so. Bureaucracy is not required.


In a sense you can do the same thing to yourself. If you self-impose a target and try to meet it while ignoring a lot of things that you're not measuring even though they're still important, you can unintentionally sacrifice those things. But there's a difference.

In that case you have to not notice it, which sets a much lower cap on how messed up things can get. If things are really on fire then you notice right away and you have the agency to do something different.

Whereas if the target is imposed by a far-off hierarchy or regulatory bureaucracy, the people on the ground who notice that things are going wrong have no authority to change it, which means they carry on going wrong.

Or put it this way: The degree to which it's a problem is proportional to the size of the bureaucracy. You can cause some trouble for yourself if you're not paying attention but you're still directly exposed to "hear reason or she'll make you feel her". If it's just you and your boss who you talk to every day, that's not as good but it's still not that bad. But if the people imposing the target aren't even in the same state, you can be filling the morgue with bodies and still not have them notice.


  > In a sense you can do the same thing to yourself.
Of course. I said you can do it unknowingly too.

  > The degree to which it's a problem is proportional to the size of the bureaucracy.
Now take a few steps more and answer "why". What are the reasons this happens and what are the reasons people think it is reasonable? Do you think it happens purely because people are dumb? Or smart but unintended. I think you should look back at my comment because it handles both cases.

To be clear, I'm not saying you're wrong. We're just talking about the concept at different depths.


I don't think the premise that everything is a proxy is right. We can distinguish between proxies and components.

A proxy is something like, you're trying to tell if hiring discrimination is happening or to minimize it so you look at the proportion of each race in some occupation compared to their proportion of the general population. That's only a proxy because there could be reasons other than hiring discrimination for a disparity.

A component is something like, a spaceship needs to go fast. That's not the only thing it needs to do, but space is really big so going fast is kind of a sine qua non of making a spaceship useful and that's the direct requirement rather than a proxy for it.

Goodhart's law can apply to both. The problem with proxies is they're misaligned. The problem with components is they're incomplete. But this is where we come back to the principal-agent problem.

If you could enumerate all of the components and target them all then you'd have a way out of Goodhart's law. Of course, you can't because there are too many of them. But, many of the components -- especially the ones people take for granted and fail to list -- are satisfied by default or with minimal effort. And then enumerating the others, the ones that are both important and hard to satisfy, gets you what you're after in practice.

As long as the person setting the target and the person meeting it are the same person. When they're not, the person setting the target can't take anything for granted because otherwise the person meeting the target can take advantage of that.

> What are the reasons this happens and what are the reasons people think it is reasonable? Do you think it happens purely because people are dumb? Or smart but unintended.

In many cases it's because there are people (regulators, corporate bureaucrats) who aren't in a position to do something without causing significant collateral damage because they only have access to weak proxies, and then they cause the collateral damage because we required them to do it regardless, when we shouldn't have been trying to get them to do something they're in no position to do well.


  > I don't think the premise that everything is a proxy is right.
I said every measurement. That is a key word.

I know we're operating at a level that most people never encounter, but you cannot in fact measure a meter. You can use a reference tool like a ruler to try to measure distance which is calibrated. But that's a proxy. You aren't measuring a meter, you're measuring with a tool that is estimating a meter. You can get really precise and use a laser. But now you're actually doing a time of flight measurement, where a laser is bouncing off of something and you're measuring the time it takes to come back. Technically you're always getting 2x the measurement but either way you're actually not measuring distance you're measuring a light impulse (which is going to have units like candles or watts) and timing it, which we then convert those units to meters. You can continue this further to even recognize the limits of each of those estimates and this is an important factor if you're trying to determine the sensitivity (and thus error) of your device.

So I think you really aren't understanding this point. There is no possible way you can directly measure even the most fundamental scientific units (your best chance is going to probably be a mole but quantum mechanics is going to fuck you up).

  > The problem with proxies is they're misaligned. The problem with components is they're incomplete.
If you pay close attention to what I'm talking about then you might find that these aren't as different as you think they are.

  > If you could enumerate all of the components and target them all then you'd have a way out of Goodhart's law.
Which is my point. It isn't just that you can't because they are abstract, you can't because the physical limits of the universe prevent you to in even the non-abstract cases.

I am 100% behind you in that we should better define what we're trying to measure. But this is no different than talking about measuring something with higher precision. Our example above moved from a physical reference device to a laser and a stopwatch. That's a pretty dramatic shift, right? Uses completely different mechanisms. So abstract what you're thinking just a little so we can generalize the concept. I think if you do then we'll be on the same page.

  > In many cases
I think you misunderstood my point here. Those were rhetorical questions and the last sentence tells you why I used them. They were not questions I needed answering. Frankly, I believe something similar is happening throughout our conversation since you are frequently trying to answer questions that don't need answering and telling me things which I have even directly acknowledged. It's creating a weird situation where I don't know how to answer because I don't know how you'll interpret what I'm saying. You seem to think that I'm disagreeing with you on everything and that just isn't true. For the most part I do agree. But to get you on the same level as me I need you to be addressing why these things are happening. Keep asking why until you don't know. That exists at some depth, right? It's true for everyone since we're not omniscient gods. My conclusion certainly isn't all comprehensive, but it does find this interesting and critical part where we run into something you would probably be less surprised about if you looked at my name.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: