*Argumentum ad populum* my friend. It is absolutely worth making the questions m...

mlyle · on Aug 12, 2022

> Argumentum ad populum my friend.

Sorry, no. Overwhelming agreement along with readily self-observable characteristics is good enough. You can claim that the kidney is actually the largest organ of the human body, but everyone else agreeing is good enough-- especially when we can look at pictures from other people we trust, arrange to check a cadaver ourselves if we really care, etc.

I mean, I guess it's possible Big Skin (tm) has rigged all the measures and faked everything. /s

> What I am saying is that you can't make your broad claim based even what I suspect are deeply massaged figures.

One submeasure on one subpopulation having an outcome that does not support the claim but is also not inconsistent with that claim doesn't invalidate the claim.

You are talking about:

* A non-statistically significant finding

* Showing an opposite slope, but of tiny magnitude compared to the slopes in the other direction

* On one subpopulation

* On one submeasure.

somenameforme · on Aug 12, 2022

When Galileo made his case for a heliocentric universe, there was no silver bullet he offered. He simply personally felt it to be more probable than the geocentric model in spite the overwhelming evidence to the contrary given the evidence of the time. Would you thus claim he was lacked knowledge (of the geocentric universe) if he chose to respond false on a question "True/False Everything in the universe revolves around the Earth?" Measuring agreement is different than measuring knowledge.

As for your hypothesis, what would you propose would be a null hypothesis to test your hypothesis? It seems to me that it would be "Somebody who opposes a scientific consensus will score about the same in a test of general knowledge as somebody who supports it." And the climate change example confirms that null hypothesis, which would thus reject your hypothesis.

mlyle · on Aug 12, 2022

I'm getting less and less interested in galloping around talking about different minutiae. I basically disagree with your entire argument. But:

> And the climate change example confirms that null hypothesis, which would thus reject your hypothesis.

???! fails to reject the null hypothesis. This seems to be a rather large and fundamental error. You don't ever "confirm[ the] null hypothesis"-- and you certainly don't in a way that lets you exclude other explanations.

e.g. I flip a coin twice. It is heads once and tails once. I perform a statistical test and this outcome is consistent with random chance. In no way does this let me reject the idea that the coin is unfair.

(Indeed, doing a trial of 4 flips and getting 4 heads in a row is consistent with random chance; such a trial would not be likely to increase my belief that the coin is fair, even if it is not strong statistical evidence that the coin is unfair).

Back to our case: we have a whole lot of strong results in one direction, and one ambiguous near-zero result in one submeasure in one subpopulation. Perhaps that subpopulation really is different. It's an interesting thing to measure again. But if you measure enough things, you can expect to find some of these in your result set purely by chance.

somenameforme · on Aug 13, 2022

With all due respect, that feels like a dodge. I decided to look at the result data for this survey [1]. I was unsurprised to find, in a simple skim of only the first survey, at least 2 comments mentioning this issue we're discussing. Asking an e.g. creationist "True/False The Earth is about 4.5 billion years old." and "True/False Scientists claim that the Earth is about 4.5 billion years old." are questions that are going to get very different answers. One measures assent with consensus, one measures knowledge of it. At least 2 people actually manually typing the issue out in a relatively small sample (for this sort of arbitrary specificity) means this is an extremely significant issue.

My argument, in the most specific form, is that I do not believe this study measures, or was intended to measure, what the authors' claim it does. This issue is quite subtle, but I'm inclined to believe in malice over incompetence when these authors have spent years working these numbers [2]. The current study is a rehash of a study they did two years ago. The max assent group does indeed demonstrate reduced knowledge / increased self perceived knowledge. And there are demographic biases abounds. The desired trend varies by an order of magnitude in countries outside the US.

So this study took the most extreme sample from a subpopulation they were already familiar with the results of, stripped out the subpopulation of that subpopulation which challenged their hypothesis, added in a number of questions that were more likely to result in noise than anything else (the population of those taking surveys for pennies a piece on Amazon Turk, that can explain the difference between e.g. UVA and UVB radiation is certainly near zero so true/false just gives you noise and randomness), and then published. Another fun sample bias: Literally every single person who took fewer than 3 minutes to complete the survey (again based on survey 1) and bothered to fill in some (generally random) answers indicated max dissent. An interesting issue in itself, but a further problem in this study given relatively few people indicated max dissent, so they're measuring people literally racing through the survey for their $0.85 likely without reading it beyond looking for the "Please select #2 as the answer to this question." questions.

[1] - https://osf.io/x23c8/?view_only=29a92a9a707547149f210e5bf76a...

[2] - https://osf.io/t82j3/

mlyle · on Aug 13, 2022

> With all due respect, that feels like a dodge.

I note you've completely moved on from the huge fallacy of a statistical claim that you made, to choose to circle back to hit one of the other topics we've already discussed earlier.

We all know about the Gish gallop. Pick one. You basically committed data interpretation murder in the grandparent comment.

If I fail to engage with every single point of yours, it doesn't make those points right. Especially when they're repeated after having already been addresses!

> means this is an extremely significant issue.

OK, so there's some questions on the whole test which might inadvertently measure religiosity-- if people don't realize they're supposed to respond with the scientific belief. (Of course, once they're betting on how they will score on that test, it's hard to understand what else they might believe would be the rubric for said test). Even so, this would hardly explains things showing up in the subscores with the same slopes, unless you think the bulk of the test is contaminated.

You're the one pulling the CSVs and running analyses-- is that question missed more by people who have anti-consensus views?

> An interesting issue in itself, but a further problem in this study given relatively few people indicated max dissent, so they're measuring people literally racing through the survey for their $0.85 likely without reading it beyond looking for the "Please select #2 as the answer to this question." questions.

Look, it's just easy to eyeball the graphs and see removing the '7's does nothing to the slope of the lines.

somenameforme · on Aug 15, 2022

The "betting" was for $0.50, which was not lost if they failed. The only cost was not accepting an alternative $0.25 immediate bonus. There's this major issue of sampling only people doing online surveys for pennies a piece, and it is possible that this specific sample might meaningfully change their behavior for $0.50 (or $0.25, depending on how you look at it), but I wouldn't really assume as much. If they really wanted that extra quarter or two, they could have simply looked everything up online.

Looking briefly over the data, anti-consensus views do seem to correlate well with questions that were basically just proxies for measuring that dissent again. For instance one question was 'man was alive at the same time as dinosaurs' type thing. The correct answer on 1-6 is a 1. People who rejected the evolution consensus scored 4.6 on it, with a mode of 7. The mean of all respondents was 3. The score of people who indicated max opposition to e.g. nuclear power was 2.85. Those samples were my first and not cherry picked, beyond intentionally picking a group that would likely have very different biases (opposition to evolution vs opposition to nuclear) to compare against.

Imagine you simply framed the question slightly differently, but testing the exact same knowledge, "Most scientists do not believe humans were alive at the same time as the dinosaurs." I suspect you would have gotten radically different responses. And this issue (in various incarnations) is something I think that is absolutely destroying the 'soft sciences.' Quite small changes in experiment or survey can yield radically different results, and that can be exploited to "prove", more or less, whatever you want to.

p.s. The reason I moved on is because it was simply poor phrasing in a casual anon chat.

mlyle · on Aug 15, 2022

> p.s. The reason I moved on is because it was simply poor phrasing in a casual anon chat.

It wasn't poor phrasing. Being unable to reject the null hypothesis (on one subtest, even) doesn't mean there's no effect. If you test something 15 different ways, and you get a statistically significant effect with consistent slope on 12 of them, a non-statistically significant effect with a slope in the same direction on 2, and a non-statistically significant effect with a tiny slope in the opposite direction on 1... that looks pretty universal. It's an interesting question for followup research whether those 3 things are actually different in some way, or whether it was luck.

somenameforme · on Aug 16, 2022

You look at a billion and one swans. For the first billion you see nothing but white swans, yet for the final one you look at you see a black swan. You now cannot say swans are universally white. All it takes is a single exception to break a rule.

This is one of the many issues I have with social science - lack of falsifiability. Social science effectively lacks any degree of falsifiability simply because everything is based on vague probabilistic distributions which vary dramatically based on subtle nuances like asking an identical question two different ways, or testing on different populations in cases where the hypothesis is suggested to generalize - as here.

If there was any notion of falsification, it would all be false. In many ways it's quite analogous to astrology which, if you are unaware, was indeed considered a science, studied in universities for hundreds of years. It's end, somewhat ironically, came not from scientific objection to the fact it suffered the same issues as the social sciences today, but from the Church who deemed it heretical in claims of defacto divination.

mlyle · on Aug 16, 2022

> You look at a billion and one swans. For the first billion you see nothing but white swans, yet for the final one you look at you see a black swan.

This analogy doesn't fit our circumstance.

You take two photos each of 7 swans, a close-up and a wide-angle one.

For all of the 7 wide-angle photos, the swan is obviously white.

For the 7 close up-photos: you get 4 good close-up photos of white swan. Then there's 3 bad photos. 2 of them look probably white, and 1 looks like the swan is maybe dark grey.

All the evidence you have is consistent with all the swans being white. Looking to see if you can find those 3 swans and take a better close look at them might be a nice bit of followup.

somenameforme · on Aug 16, 2022

So you have 7 photos, only 4 of them clearly confirm what you want to say, 1 likely contradicting it, and you're happy to take this as a universal? I also think your analogy doesn't hold for another reason. Look at the last study.

This study is intentionally juking their numbers. They "learned" from their first study and removed every single sample that contradicted what they want to 'prove', magnified those that confirmed it, and even under this form of "science" - they still failed to show statistical significance for multiple categories.

It'd be like noticing there were far "dark" swans in Eastland and so going out of your way to make sure in your next safari you limit yourself to Westland. If you genuinely believed all swans were white, and wanted to test that hypothesis, you'd have done the exact opposite. There is only one conclusion to make.

mlyle · on Aug 16, 2022

> So you have 7 photos, only 4 of them clearly confirm what you want to say, 1 likely contradicting it, and you're happy to take this as a universal? I also think your analogy doesn't hold for another reason

I'm sorry; I disagree and I don't think you understand the statistics.

> Look at the last study.

This is another new goalpost. I am not willing to discuss further.

somenameforme · on Aug 17, 2022

Imagine for a moment that this study was trying to demonstrate that e.g. climate change was primarily caused by solar variation a la Willie Wei-Hock Soon [1]. And he provided the exact same quality of evidence, and the exact same "issues." There is zero chance you would now be claiming this to be a "universal" in spite of the exact same quality, or lack thereof, of evidence.

I was not shifting any goalposts, but simply referencing more evidence on the fundamental issue: that this study was intentionally biased and misleading with the goal of "proving", by any means necessary, a conclusion of which the authors had 0 interest in genuinely testing.

As I suspect we'll be wrapping up, I simply want to thank you for the interesting and lively discussion on this issue. It's rare two people can disagree on the internet for more than a few posts without things devolving into 'You're a poopy face.' 'No, you!'

[1] - https://en.wikipedia.org/wiki/Willie_Soon

mlyle · on Aug 17, 2022

> Imagine for a moment that this study was trying to demonstrate that e.g. climate change was primarily caused by solar variation a la Willie Wei-Hock Soon [1]. And he provided the exact same quality of evidence, and the exact same "issues."

I suspect we don't agree on what the "exact same quality of evidence" would be.

But the prior probability matters, too. A trial where a die looks slightly-biased-towards-6 (with weak significance) after 3 prior trials where it looked strongly-biased-towards-1 (with high significance) doesn't do much to change our mind about the previous findings.

Base rate fallacy, yada yada yada.