Sorry...but I don't get what you're postulating...that the doctors and hospitals whose reputations are at stake...nevermind the lives of their patients who are affected by policy decisions...would be totally ignorant of that basic statistical caveat? That it's possible that the top-of-the-line hospital just happened to cure a couple of really lucky patients and everyone who has lauded that hospital has never heard of regression analysis?
I guess it's possible...but it's probably more likely that the New Yorker is not trying to be a reference on statistical methods. In any case, the statistical caveat you mention is arguably addressed in this catchall-paragraph that briefly describes the problem of quality-of-care statistics:
> In recent years, there have been numerous efforts to measure how various hospitals and doctors perform. No one has found the task easy. One difficulty has been figuring out what to measure. For six years, from 1986 to 1992, the federal government released an annual report that came to be known as the Death List, which ranked all the hospitals in the country by their death rate for elderly and disabled patients on Medicare. The spread was alarmingly wide, and the Death List made headlines the first year it came out. But the rankings proved to be almost useless. Death among the elderly or disabled mostly has to do with how old or sick they are to begin with, and the statisticians could never quite work out how to apportion blame between nature and doctors. Volatility in the numbers was one sign of the trouble. Hospitals’ rankings varied widely from one year to the next based on a handful of random deaths. It was unclear what kind of changes would improve their performance (other than sending their sickest patients to other hospitals). Pretty soon the public simply ignored the rankings.
Even with younger patients, death rates are a poor metric for how doctors do. After all, very few young patients die, and when they do it’s rarely a surprise; most already have metastatic cancer or horrendous injuries or the like. What one really wants to know is how we perform in typical circumstances. After I’ve done an appendectomy, how long does it take for my patients to fully recover? After I’ve taken out a thyroid cancer, how often do my patients have serious avoidable complications? How do my results compare with those of other surgeons?
(the author himself is a surgeon and has written a lot about the problems of reliably measuring quality of care performance)
To be clear, I'm not at all suggesting that doctors do this out of malice - only that effective statistical analysis is really, really, hard, and it's not immediately clear that surgeons (regardless of their international reputation) would have the background required to properly analyze these studies in a coherent framework, especially when it's not immediately relevant to a patients outcome in an operating theatre.
I'm not suggesting that they're ignorant of basic statistical facts, but I am definitely suggesting that they're not immediately aware of the subtle assumptions implicit in many of the statistical models they use.
For example, given the large numbers we're talking about, not only is it possible that the "number 1" hospital in any particular field is there because of statistical fluke, it's actually likely that this is the case.
The (mis)use of statistics certainly isn't limited to medicine, but it is one of the places where its misinterpretation has the biggest impact.
I guess it's possible...but it's probably more likely that the New Yorker is not trying to be a reference on statistical methods. In any case, the statistical caveat you mention is arguably addressed in this catchall-paragraph that briefly describes the problem of quality-of-care statistics:
> In recent years, there have been numerous efforts to measure how various hospitals and doctors perform. No one has found the task easy. One difficulty has been figuring out what to measure. For six years, from 1986 to 1992, the federal government released an annual report that came to be known as the Death List, which ranked all the hospitals in the country by their death rate for elderly and disabled patients on Medicare. The spread was alarmingly wide, and the Death List made headlines the first year it came out. But the rankings proved to be almost useless. Death among the elderly or disabled mostly has to do with how old or sick they are to begin with, and the statisticians could never quite work out how to apportion blame between nature and doctors. Volatility in the numbers was one sign of the trouble. Hospitals’ rankings varied widely from one year to the next based on a handful of random deaths. It was unclear what kind of changes would improve their performance (other than sending their sickest patients to other hospitals). Pretty soon the public simply ignored the rankings.
Even with younger patients, death rates are a poor metric for how doctors do. After all, very few young patients die, and when they do it’s rarely a surprise; most already have metastatic cancer or horrendous injuries or the like. What one really wants to know is how we perform in typical circumstances. After I’ve done an appendectomy, how long does it take for my patients to fully recover? After I’ve taken out a thyroid cancer, how often do my patients have serious avoidable complications? How do my results compare with those of other surgeons?
(the author himself is a surgeon and has written a lot about the problems of reliably measuring quality of care performance)