Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ignoring your attempt at predictably headed ad hominem, you again cannot make your 'conclusion' from the study. They used a non-representative sample, removed the results of those who fully agreed with the consensus, and just generally did everything trying to massage their numbers into the conclusion they wanted to make.

Ironically, the numbers don't even support that if you do use the subscales. In all studies the overall effect from opposition was a fraction of a single point difference. In most studies opposition to the consensus in climate change was also predictive of a higher than average level of field-specific knowledge.



> Ignoring your attempt at predictably headed ad hominem

???? ad hominem? Where? I'm saying it's very difficult to quiz about the actual controversy itself, because then we get very close to the point of disagreement and risk confounding. The research explicitly mentions this aspect of design.

> They used a non-representative sample, removed the results of those who fully agreed with the consensus, and just generally did everything trying to massage their numbers into the conclusion they wanted to make.

I don't love use of Mechanical Turk for social sciences. It's still an interesting finding. Of course, more and higher quality research should be used to confirm the effect and gain additional nuance.

> In all studies the overall effect from opposition was a fraction of a single point difference.

The overall effect from opposition was a fraction of a single point difference per unit of opposition.


You're definitely correct there on my misreading of the tables. To further clarify, I also decided to see precisely what "points" meant rather than continuing to just skim. They chose to rate true/false answers on a -3 to +3 scale driven by the respondent's certainty. This means one question wrong, out of 34, was able to drive up to a 6 point difference at max certainty. And so, somewhat serendipitously, my point, with some 'modification' remains that opposition in no case was rarely able to explain even a single field-specific question missed, at least not at max certainty.

As for the replicability - again, this study arbitrarily removed people fully in line with the consensus. That makes it fairly safe to say that replication is a nonstarter. But I'd also add that another red flag for social science papers is when they collect a large number of variables that end up having no relevance to the published conclusion. Large numbers of variables is a key resource in p-hacking. And this paper was collecting all sorts of data that went completely unused and had nothing whatsoever to do with what they ultimately chose to publish.


> The effect of opposition on the binarized full set of 34 objective knowledge items variable:

> Estimate Std. Error df t value Pr(>|t|)

> (Intercept) 25.90592 0.43454 13.11332 59.617 <2e-16 **

> opposition -0.664790.06842 2130.90896 -9.717 <2e-16 **

Looks like each point of opposition was about 2/3rds of one more incorrect true/false question. So the people who were most opposed scored >3 questions worse on a 34 question test, on average.

> again, this study arbitrarily removed people fully in line with the consensus.

Median filtering and trimming saturated measures is common in research like this-- hopefully designed into the original protocol. I do agree it would be nice to see their preregistration.

But, weird things happen at the tails-- truncation effects, etc.


Like mentioned, I was more interested in the impact on the subtopic specific results because of what we discussed. Giving a quiz on American history to judge your knowledge of French history is obviously not a reasonable idea, even if the skill-set overlap would probably give at least some weak correlation. In the field-specific binarized case (which removes the -3 to +3 noise), the impact is 0.09 points. So that translates to a fraction of a question difference in results.

And while I'm aware trimming extreme outliers, regardless of the side they end up on, I'm unaware of any study entirely removing a segment from their sample, from one side only, which is critical to your entire hypothesis, and doing so without any explanation, let alone justification, whatsoever. One thing I'd observe is that the observed effect is small enough that if this culled group had a knowledge specific score below the mean, then it's likely that their entire conclusion would be invalid.

One other issue we have not discussed is the methodology not only resulted in a very non-representative sample, but also was indirectly testing something else. The surveys were done on the internet, and all of the general knowledge questions (besides the covid ones, which were just...) have answers which can be looked up in a matter of seconds.


> I was more interested in the impact on the subtopic specific results because of what we discussed.

I think they're both interesting. Broad overconfidence in performance in basic science combined with low performance is an interesting characteristic of a population. The fact that this is represented in those with contrarian views is interesting.

> So that translates to a fraction of a question difference in results.

The subscale finding, just shows that this same phenomenon appears to also extend to the subjects they are contrarian about. And it's about the same magnitude of effect, because there's very few questions on the subscale.

On the big test, a participant does about 2% worse per unit of disagreement. On the subscale, they do about 1.8% worse per unit of disagreement. The effect appears to have the same magnitude and is statistically significant in both cases.

This is an invitation to do studies on specific fields of disagreement with better samples and more questions on the subscale. But this early research casts a wide net across many types of anti-consensus view. Study 5 is a small, possibly flawed step in this direction.

I'd be particularly interested in a 2 variable analysis that attempts to model how one's overall objective performance and level of disagreement models objective performance in the specific field where they disagreed.

> outliers, ...

This is all moving goalposts when we were originally discussing the question set. I've already agreed that sociology and psychology research via Turk is problematic and faces many confounds and selection problems.


Something I recently considered was that many of the questions, the "good" questions, do ultimately test ideology more than knowledge. The obvious example would be something like "Humans share the majority of their genes with chimps." This is of course factually true true, but it seems much more likely to test ideology than scientific knowledge, in part because of how it is phrased. Imagine the question were framed as "Scientists claim that humans share the majority of their genes with chimps." Now you are no longer testing agreement with the scientific view, but knowledge of it.

And perhaps a bigger issue is that there were literally zero questions that would do the same "trick" vice-versa for somebody who ideologically agrees with one of the topics, but otherwise lacked much knowledge. Testing this would have been quite easy by simply throwing in questions that sound like they fit the consensus but have an important aspect that makes them false. An example would be "Evolution is a process of improving a species which, over time, may lead one species to become an entirely new one." That is, of course, false - but somebody of low knowledge and high consensus agreement, would likely consider it to be true.

My personal hypothesis would be that extreme views tend to be associated with high confidence and relatively low knowledge. And this could be extremes of agreement or disagreement with a topic. As Betrand Russell put it, "The fundamental cause of the trouble is that in the modern world the stupid are cocksure while the intelligent are full of doubt."


> This is of course factually true true

If it's factually true, it's not testing ideology.

> but it seems much more likely to test ideology than scientific knowledge, in part because of how it is phrased

Is there anyone credible who disagrees with the statement here?

I think a bigger issue is that it may be testing not knowledge but careful, critical reading. I can score well on this test, but my flippant quick answers are not very accurate. I am not surprised those that are worse at critical reading might be more likely to have anti-consensus views.

> My personal hypothesis would be that extreme views tend to be associated with high confidence and relatively low knowledge.

That doesn't seem to be the case here; while we don't know the "1" full agreement results, the rest seems to be a pretty dang linear trend line.


Nobody is forced to accept a fact if they do not want to. There's a large gulf between awareness of knowledge, including facts, and agreement with such. This is an especially important nuance in this quiz, where the precise implications/meanings of knowledge vs ideology are a critical part of the study.

Imagine that there was a secular Islamic expert. And he was quizzed on Islamic ideology in a similar fashion, such that he was obligated to imply belief in such, in his answers. Assuming he is being as accurate as he can in his responses, he would end up scoring as completely ignorant of Islam as he marked all beliefs as false.

I disagree there's a clear trend here, even without the 1s, because there were clear exceptions such as e.g. climate change where, even by their metrics, anti-consensus views were predictive of greater knowledge. So you end up having to, at a minimum, limit the stated claim to specific fields.


> Nobody is forced to accept a fact if they do not want to. There's a large gulf between awareness of knowledge, including facts, and agreement with such. This is an especially important nuance in this quiz, where the precise implications/meanings of knowledge vs ideology are a critical part of the study.

OK, well, here they expressed disagreement with stuff that is not just the scientific consensus, but the overwhelming agreement. If you want to phrase it as disagreeing rather than knowledge being lacking-- I think this is unhelpful. I do not think the nuance you're driving at here is worth complicating the questions or instructions.

> because there were clear exceptions such as e.g. climate change

No.

All 7 areas of controversy did worse on the entire test. It was not statistically significant for climate change and evolution. It was highly statistically significant for 4 of the remaining 5.

6 of the 7 did worse on their on subscales (significantly significant for 4 of these 6), with effect sizes of -.83, -.65, -.28, -.82, -.88, -.55; climate change had a non-statistically significant finding of an effect with slope .03. I do not consider this a counterexample.

You've been really advocating for study power and significance of results... when it favors your argument.


Argumentum ad populum my friend.

It is absolutely worth making the questions more clear, because this is the entire point of the study. Do people who do not agree with the consensus on something lack knowledge, or do simply not agree with said knowledge? Conflating the two in your own study does nothing but set yourself up to observe a tautology (those who disagree with a consensus, disagree with a consensus), which a cynic may interpret as not entirely accidental.

And no, I'm not referencing statistical power. If I haven't made myself clear, I think this study and their numbers, are both going to fall well into the endless black hole that is the replication crisis, which is especially pronounced in social psychology - which has an aggregate replication success rate in the 20s. What I am saying is that you can't make your broad claim based even what I suspect are deeply massaged figures.


> Argumentum ad populum my friend.

Sorry, no. Overwhelming agreement along with readily self-observable characteristics is good enough. You can claim that the kidney is actually the largest organ of the human body, but everyone else agreeing is good enough-- especially when we can look at pictures from other people we trust, arrange to check a cadaver ourselves if we really care, etc.

I mean, I guess it's possible Big Skin (tm) has rigged all the measures and faked everything. /s

> What I am saying is that you can't make your broad claim based even what I suspect are deeply massaged figures.

One submeasure on one subpopulation having an outcome that does not support the claim but is also not inconsistent with that claim doesn't invalidate the claim.

You are talking about:

* A non-statistically significant finding

* Showing an opposite slope, but of tiny magnitude compared to the slopes in the other direction

* On one subpopulation

* On one submeasure.


When Galileo made his case for a heliocentric universe, there was no silver bullet he offered. He simply personally felt it to be more probable than the geocentric model in spite the overwhelming evidence to the contrary given the evidence of the time. Would you thus claim he was lacked knowledge (of the geocentric universe) if he chose to respond false on a question "True/False Everything in the universe revolves around the Earth?" Measuring agreement is different than measuring knowledge.

As for your hypothesis, what would you propose would be a null hypothesis to test your hypothesis? It seems to me that it would be "Somebody who opposes a scientific consensus will score about the same in a test of general knowledge as somebody who supports it." And the climate change example confirms that null hypothesis, which would thus reject your hypothesis.


I'm getting less and less interested in galloping around talking about different minutiae. I basically disagree with your entire argument. But:

> And the climate change example confirms that null hypothesis, which would thus reject your hypothesis.

???! fails to reject the null hypothesis. This seems to be a rather large and fundamental error. You don't ever "confirm[ the] null hypothesis"-- and you certainly don't in a way that lets you exclude other explanations.

e.g. I flip a coin twice. It is heads once and tails once. I perform a statistical test and this outcome is consistent with random chance. In no way does this let me reject the idea that the coin is unfair.

(Indeed, doing a trial of 4 flips and getting 4 heads in a row is consistent with random chance; such a trial would not be likely to increase my belief that the coin is fair, even if it is not strong statistical evidence that the coin is unfair).

Back to our case: we have a whole lot of strong results in one direction, and one ambiguous near-zero result in one submeasure in one subpopulation. Perhaps that subpopulation really is different. It's an interesting thing to measure again. But if you measure enough things, you can expect to find some of these in your result set purely by chance.


With all due respect, that feels like a dodge. I decided to look at the result data for this survey [1]. I was unsurprised to find, in a simple skim of only the first survey, at least 2 comments mentioning this issue we're discussing. Asking an e.g. creationist "True/False The Earth is about 4.5 billion years old." and "True/False Scientists claim that the Earth is about 4.5 billion years old." are questions that are going to get very different answers. One measures assent with consensus, one measures knowledge of it. At least 2 people actually manually typing the issue out in a relatively small sample (for this sort of arbitrary specificity) means this is an extremely significant issue.

My argument, in the most specific form, is that I do not believe this study measures, or was intended to measure, what the authors' claim it does. This issue is quite subtle, but I'm inclined to believe in malice over incompetence when these authors have spent years working these numbers [2]. The current study is a rehash of a study they did two years ago. The max assent group does indeed demonstrate reduced knowledge / increased self perceived knowledge. And there are demographic biases abounds. The desired trend varies by an order of magnitude in countries outside the US.

So this study took the most extreme sample from a subpopulation they were already familiar with the results of, stripped out the subpopulation of that subpopulation which challenged their hypothesis, added in a number of questions that were more likely to result in noise than anything else (the population of those taking surveys for pennies a piece on Amazon Turk, that can explain the difference between e.g. UVA and UVB radiation is certainly near zero so true/false just gives you noise and randomness), and then published. Another fun sample bias: Literally every single person who took fewer than 3 minutes to complete the survey (again based on survey 1) and bothered to fill in some (generally random) answers indicated max dissent. An interesting issue in itself, but a further problem in this study given relatively few people indicated max dissent, so they're measuring people literally racing through the survey for their $0.85 likely without reading it beyond looking for the "Please select #2 as the answer to this question." questions.

[1] - https://osf.io/x23c8/?view_only=29a92a9a707547149f210e5bf76a...

[2] - https://osf.io/t82j3/


> With all due respect, that feels like a dodge.

I note you've completely moved on from the huge fallacy of a statistical claim that you made, to choose to circle back to hit one of the other topics we've already discussed earlier.

We all know about the Gish gallop. Pick one. You basically committed data interpretation murder in the grandparent comment.

If I fail to engage with every single point of yours, it doesn't make those points right. Especially when they're repeated after having already been addresses!

> means this is an extremely significant issue.

OK, so there's some questions on the whole test which might inadvertently measure religiosity-- if people don't realize they're supposed to respond with the scientific belief. (Of course, once they're betting on how they will score on that test, it's hard to understand what else they might believe would be the rubric for said test). Even so, this would hardly explains things showing up in the subscores with the same slopes, unless you think the bulk of the test is contaminated.

You're the one pulling the CSVs and running analyses-- is that question missed more by people who have anti-consensus views?

> An interesting issue in itself, but a further problem in this study given relatively few people indicated max dissent, so they're measuring people literally racing through the survey for their $0.85 likely without reading it beyond looking for the "Please select #2 as the answer to this question." questions.

Look, it's just easy to eyeball the graphs and see removing the '7's does nothing to the slope of the lines.


The "betting" was for $0.50, which was not lost if they failed. The only cost was not accepting an alternative $0.25 immediate bonus. There's this major issue of sampling only people doing online surveys for pennies a piece, and it is possible that this specific sample might meaningfully change their behavior for $0.50 (or $0.25, depending on how you look at it), but I wouldn't really assume as much. If they really wanted that extra quarter or two, they could have simply looked everything up online.

Looking briefly over the data, anti-consensus views do seem to correlate well with questions that were basically just proxies for measuring that dissent again. For instance one question was 'man was alive at the same time as dinosaurs' type thing. The correct answer on 1-6 is a 1. People who rejected the evolution consensus scored 4.6 on it, with a mode of 7. The mean of all respondents was 3. The score of people who indicated max opposition to e.g. nuclear power was 2.85. Those samples were my first and not cherry picked, beyond intentionally picking a group that would likely have very different biases (opposition to evolution vs opposition to nuclear) to compare against.

Imagine you simply framed the question slightly differently, but testing the exact same knowledge, "Most scientists do not believe humans were alive at the same time as the dinosaurs." I suspect you would have gotten radically different responses. And this issue (in various incarnations) is something I think that is absolutely destroying the 'soft sciences.' Quite small changes in experiment or survey can yield radically different results, and that can be exploited to "prove", more or less, whatever you want to.

p.s. The reason I moved on is because it was simply poor phrasing in a casual anon chat.


> p.s. The reason I moved on is because it was simply poor phrasing in a casual anon chat.

It wasn't poor phrasing. Being unable to reject the null hypothesis (on one subtest, even) doesn't mean there's no effect. If you test something 15 different ways, and you get a statistically significant effect with consistent slope on 12 of them, a non-statistically significant effect with a slope in the same direction on 2, and a non-statistically significant effect with a tiny slope in the opposite direction on 1... that looks pretty universal. It's an interesting question for followup research whether those 3 things are actually different in some way, or whether it was luck.


You look at a billion and one swans. For the first billion you see nothing but white swans, yet for the final one you look at you see a black swan. You now cannot say swans are universally white. All it takes is a single exception to break a rule.

This is one of the many issues I have with social science - lack of falsifiability. Social science effectively lacks any degree of falsifiability simply because everything is based on vague probabilistic distributions which vary dramatically based on subtle nuances like asking an identical question two different ways, or testing on different populations in cases where the hypothesis is suggested to generalize - as here.

If there was any notion of falsification, it would all be false. In many ways it's quite analogous to astrology which, if you are unaware, was indeed considered a science, studied in universities for hundreds of years. It's end, somewhat ironically, came not from scientific objection to the fact it suffered the same issues as the social sciences today, but from the Church who deemed it heretical in claims of defacto divination.


> You look at a billion and one swans. For the first billion you see nothing but white swans, yet for the final one you look at you see a black swan.

This analogy doesn't fit our circumstance.

You take two photos each of 7 swans, a close-up and a wide-angle one.

For all of the 7 wide-angle photos, the swan is obviously white.

For the 7 close up-photos: you get 4 good close-up photos of white swan. Then there's 3 bad photos. 2 of them look probably white, and 1 looks like the swan is maybe dark grey.

All the evidence you have is consistent with all the swans being white. Looking to see if you can find those 3 swans and take a better close look at them might be a nice bit of followup.


So you have 7 photos, only 4 of them clearly confirm what you want to say, 1 likely contradicting it, and you're happy to take this as a universal? I also think your analogy doesn't hold for another reason. Look at the last study.

This study is intentionally juking their numbers. They "learned" from their first study and removed every single sample that contradicted what they want to 'prove', magnified those that confirmed it, and even under this form of "science" - they still failed to show statistical significance for multiple categories.

It'd be like noticing there were far "dark" swans in Eastland and so going out of your way to make sure in your next safari you limit yourself to Westland. If you genuinely believed all swans were white, and wanted to test that hypothesis, you'd have done the exact opposite. There is only one conclusion to make.


> So you have 7 photos, only 4 of them clearly confirm what you want to say, 1 likely contradicting it, and you're happy to take this as a universal? I also think your analogy doesn't hold for another reason

I'm sorry; I disagree and I don't think you understand the statistics.

> Look at the last study.

This is another new goalpost. I am not willing to discuss further.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: