Simpson's Paradox is Back

scythe · on May 1, 2014

Simpson's Paradox is deeper than conventional analyses suggest. I'm going to use the pilots'-late-arriving-flights data:

name | delay% alice | 30% bob | 20%

So Bob's flights are delayed less often, he's the better pilot, right? Yet:

name | night | day alice | 7/25 | 1/5 bob | 3/10 | 3/20

So now Alice looks like the better pilot!

But wait, what if the pilots are responsible for scheduling their own flights? Bob's individual batting averages might be somewhat worse than Alice's, but he's making better decisions about when to fly.

But wait, what if Alice and Bob fly out of the same airport(s), and they've agreed to let Bob fly during the day... (this is not meant to be an accurate representation of air traffic control)

When faced with a Simpson-like situation, a correct analysis usually requires considering the chain of causation, in particular, whether any stratification of the data depends on the independent variable being tested. If the stratification is a result of the test variable, it usually isn't a good one. In the Berkeley example, this possibility is the highly unlikely situation that applicants were automatically assigned to departments with gender taken into consideration -- so the stratification was valid.

DanBC · on May 1, 2014

    name   |  delay%
    alice  |  30%
    bob    |  20%

So Bob's flights are delayed less often, he's the better pilot, right? Yet:

     name | night | day
    alice | 7/25  | 1/5 
      bob | 3/10  | 3/20

ronaldx · on May 1, 2014

In the past, I've considered Simpson's paradox to be nothing more than an amusing quirk of statistics. But, the parent example has helped me to see that Simpson's paradox is frighteningly inevitable.

For example, a conclusion like:

"People taking a particular drug have worse outcomes."

doesn't reflect on the efficacy of the drug at all - because people choosing to take the drug are presumably in greater need.

Same in the piloting example above - Alice is scoring worse only because she has the more difficult assignments (perhaps because she is indeed the better pilot).

tshaddox · on May 1, 2014

It really is a particularly pathological special case of "correlation does not imply causation." Often the key is the presence of a confounding variable.

steveridout · on May 2, 2014

I don't see how the second table makes Alice look like the better pilot. She's better than Bob in the night, but worse than Bob in the day, and overall worse. Could someone explain?

cbellet · on May 1, 2014

I don't understand how Simpson's paradox is different from missing an explanatory variable and confusing correlation vs. partial correlation.

In Wikipedia's article header chart, what I see is the projection on a plane of a 3D problem, where the 3rd dimension has been overlooked. http://en.wikipedia.org/wiki/Simpson's_paradox

In Bob vs. Alice, I see also that the night/day flight dummy wasn't accounted for hence resulting in the so-called paradox.

vbs_redlof · on May 2, 2014

It's just a special case of omitted variables with categorial variables. So instead of parameter estimates being biased up or down x amount (to the extent covariates are correlated with error terms), with Simpsons's paradox the mean effect is completely wrong due to improper grouping. This often leads to flipping signs on estimated parameters -- 'surprising' results that gets papers published.

My favourite explanation: http://vudlab.com/simpsons/

mendicantB · on May 1, 2014

You're correct. It isn't different. Simpsons paradox is actually a key indicator of a confounding variable.

tshaddox · on May 2, 2014

The more complicated examples of Simpson's paradox tend to be important causes being ignored. But it's not always an issue of causality, like in the example of two Wikipedia contributors. That example doesn't really have a hidden cause, it's just the use of percentages where total articles is clearly the more useful metric.

tokenadult · on May 1, 2014

This blog post by the same author looks pretty good too. "What Can Go Wrong: My Favorite Example" (27 April 2014)

http://matloff.wordpress.com/2014/04/27/what-can-go-wrong-my...

farcical · on May 1, 2014

Tying those two posts together:

Linked post: "Everything is significant in large datasets." Certainly true, and why people should be suspicious if they see a p-value without an effect size.

Original post: "To me, one of the most unfortunate aspects of log-linear analysis as it is commonly practiced is that it is significance testing-centric, rather than based on point or interval estimation."

I like this fellow.

mathattack · on May 2, 2014

I like him too. It's dangerous, as I could see pissing away a weekend reading everything he ever wrote. :-)

craigching · on May 1, 2014

From the article:

> We’ll use the log-linear model methodology. Again see my open-source textbook if you are not familiar with this approach

Anyone have pointers to the textbook? I found one on parallel programming, but that doesn't seem like the one he's talking about here.

toki5 · on May 1, 2014

It's near the top:

"Much of the material here will be adapted from my open-source textbook on probability and statistics [0]. I’ll use R code to perform the analysis."

[0] http://heather.cs.ucdavis.edu/probstatbook

jey · on May 1, 2014

Probably this: http://heather.cs.ucdavis.edu/~matloff/probstatbook.html

jey · on May 1, 2014

Matloff has a blog? Awesome.

clarkenheim · on May 1, 2014

I can not be the only person who thought that this article would have something to do with Groundskeeper Willie!