Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great. Now you can go on https://intrade.com/ , make some bets and clean up.



I already did.


With great statistical modelling comes great cash windfalls. You're right though, it's pretty straightforward to model it out so it's been baffling to see how many people have out and out claimed that the models are wrong.


Something I don't quite understand, but maybe someone here can help me with.

As I understand it, the poll averaging works because of the central limit theorem. But with so few data points, maybe a dozen polls at most in each state, and often half that, why does it still seem to work? I thought you'd need a few dozen data points at least.

What am I missing?


Let's take two examples - Ohio (a contested state with a lot of polls) and Alabama (only one poll since 1st August, but not contested).

In Ohio there were 44 polls in the month preceding the election, with a mean of 51.4% Obama, 48.6% Romney. If the confidence interval each poll is 4%, then the interval for a single poll is 4% / sqrt(44) = 0.6%, which is easily enough to make a confident forecast of an Obama victory in Ohio (my model was > 98% confident in its Ohio forecast).

In Alabama, the last poll was on Aug 16th, and it was 40% Obama, 60% Romney. Even with a margin of error of 4%, this state was clearly going to go to Romney (my model had 99.9% confidence in this forecast).

This pattern is repeated for almost every state - the states with few polls are not contested, and the hotly contested states are a focus for pollsters, so there are a lot of polls there. The only really difficult states were Florida, Colorado and North Carolina, where the candidates polled so similarly that even with 30-40 polls you didn't have sufficient sample size to make strong forecasts.


It doesn't work like that. Polls are not independent. If poll A is out by a large amount, then it's much more likely that poll B is also out. As well, the confidence interval assumes that the sample was a fair sample, which is highly unlikely. It's very very difficult to get a true random sample of the voting populace. It's hard enough to get a random sample of the populace, and a random sample of those who vote is even harder.

That's why Nate Silver said that Romney had about a 10% chance of winning. Statistically it was much lower, but there was a very real chance of systematic polling errors.


Polls are not independent. If poll A is out by a large amount, then it's much more likely that poll B is also out.

That's a bias issue - I was only addressing sampling variation. If you wnat your model to address bias as well, of course you can build that in, given some priors on what the distribution of the bias is likely to be. I didn't bother in my model (which is why I was forecasting 98-99% chance of an Obama victory).

We could discuss potential poll bias as well, but I thought that was a bit too much for this short comment.


Each poll's reported result is also an average.


I was keen to bet on Intrade, but the ToS and such are long and confusing, and it wasn't clear how easy it is to get one's money out after a win. What was your experience like?


I actually made more taking directional bets on BetFair, which generally had narrower spreads and more liquidity.

I made a some money buying Obama on InTrade and selling on BetFair, at a time when Obama was at 65% on one and 75% on the other, but I didn't start doing it until Monday because it was a lot of faff to set up the InTrade account.

I haven't tried to get my money out yet. I'll let you know how that goes.


Props.


Putting just over $2000 on Obama several days ago would've gotten you $800.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: