> We don't really need truely objective assessments. As an analogy, imagine we randomly added or subtracted 3 minutes to the finishing times of all the runners in the Boston marathon. We wouldn't have a truely objective assessment of each runner, but it's still a useful guide about who is likely to beat whom in the next marathon.
I don't think the analogy fits. There's no single metric (like runner speed/time) to evaluate competence.
> This problem can be alleviated to a large degree by risk-adjusting the outcomes: For each patient, estimate the most likely outcome and compare the actual outcome to the estimate.
Agree.
> I don't see this as an insurmountable obstacle. For example, you could have an independent body randomly sample the reported outcomes to check that they are accurate, and apply some kind of penalty if they are not.
That would be a good solution. Part of the problem is finding a truely independent body. Medicine is over-run with various governance and regulatory bodies (several of which have been shown to be little more than rent collectors). And again, there's the problem of deference to eminence combined with small in-bred communities within each speciality or sub-specialty (who would likely be the only people with the training to evaluate their peers reliably).
> I don't think the analogy fits. There's no single metric (like runner speed/time) to evaluate competence.
Let me suggest such a metric: A T-value. That is, the degree to which the actual outcome deviates from the expected outcome.
For example, say you have a hospital that does stem cell transplants. For each patient before the treatment you assess the "chance that the patient will die within 1 year" based on that patient's age, sex, BMI, heart and lung function, type of disease, time since last relapse, quality of donor match, etc. From this you estimate that 19.6% of the hospital's patients who are treated over a particular period will die within 1 year, with a standard deviation of 3.7%. The actual mortality rate for this group of patients turns out to be 26.2%. So the T-value is +1.78; this is the 'single metric' used to evaluate the competence of the hospital.
> there's the problem of deference to eminence combined with small in-bred communities within each speciality or sub-specialty (who would likely be the only people with the training to evaluate their peers reliably).
Keep in mind that expertise is only required in estimating pre-treatment "chance that the patient will die within 1 year". Determining whether a patient is alive after a year and calculating the T-value can be done by anyone. And even when estimating expected mortality, there is plenty of evidence that a simple algorithm can actually beat the experts - Daniel Kahneman devotes a whole chapter to this point in 'Thinking Fast and Slow' [1].
Overall, I like the idea of a metric such as the one you describe. However, a change in one year mortality rate is unlikely to be a reliable indicator of the efficiency of most treatments. Most medical interventions have more modest effects than definite improvement in short term survival.
I don't think the analogy fits. There's no single metric (like runner speed/time) to evaluate competence.
> This problem can be alleviated to a large degree by risk-adjusting the outcomes: For each patient, estimate the most likely outcome and compare the actual outcome to the estimate.
Agree.
> I don't see this as an insurmountable obstacle. For example, you could have an independent body randomly sample the reported outcomes to check that they are accurate, and apply some kind of penalty if they are not.
That would be a good solution. Part of the problem is finding a truely independent body. Medicine is over-run with various governance and regulatory bodies (several of which have been shown to be little more than rent collectors). And again, there's the problem of deference to eminence combined with small in-bred communities within each speciality or sub-specialty (who would likely be the only people with the training to evaluate their peers reliably).