Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Strategies for making reproducible research the norm (elifesciences.org)
97 points by Tomte on Nov 25, 2023 | hide | past | favorite | 91 comments


It seems remiss to leave out the publish or perish dynamic that requires anyone pursuing a career in academia to churn out publications with little to no regard for their quality. Especially in light of the evidence in recent years that peer review is a poor filter for fraud.

Deincentivising an endless stream of trash publications will both cut down on the BS and narrow the field of research one might try to reproduce.

I don't have strong feelings about the optimal number of papers that should be published, but my sense is that we are running an order of magnitude above that number.


Why not just require independent reproduction to even publish?


This would also require a reduction in the "publish or perish" pressure, because it would significantly lengthen the time from study design to publishing.

Furthermore, who's going to be doing that independent reproduction? How are you going to incentivize that?

If you require independent reproduction for every paper in a field where that's even possible, that's going to mean either you need to double the rate of running experiments in that field (unlikely), or double the number of researchers in that field (also unlikely) if you want to maintain anywhere close to the current rate of papers being published. (And again, even if you could double the number of researchers, you still need to provide enough of an incentive to reproduce that literally everyone is doing so for half of the studies they run. And you'd somehow have to ensure that you don't end up with fun, interesting studies getting 15 different groups reproducing them, while less-flashy studies that are still important fundamental science languish for years without anyone caring to try.)

Given how much of "publish or perish" is institutional inertia and culture, the idea that it could be systemically "nudged" to allow for this kind of shift is essentially a fantasy. Far too many of the Old Guard who are already in charge of the tenure committees would say, "Well, I had to publish 327 papers to be given tenure, and kids these days have so many newfangled things that let them write papers faster; why shouldn't we require them to publish 327 papers, too??"


> If you require independent reproduction for every paper in a field where that's even possible, that's going to mean either you need to double the rate of running experiments in that field (unlikely), or double the number of researchers in that field (also unlikely)

You don't have to quite double the number of researchers or experiments. It typically takes less work to replicate final results than all of the work getting to the point of good results. Presumably a lot of the replication would be outsourced to CROs.

Though since the CROs would be focused on replication of results alone, any theoretical errors or errors in experiment design wouldn't be discovered. Replicating results will mostly catch fabricators. It won't catch nearly all of the far more important to the replication crisis bad experimental design.


To add to your "who's going to be doing that independent reproduction?" point, the whole purpose of the journals is to communicate to the rest of the scientific community that there's an interesting result here that might want further examination. Sure, in the absence of the ability to disseminate your findings via the usual channel, you might be able to instruct your grad students to reproduce or beg one of your regular collaborators to reproduce, but then the independence of the experiments is dubious. And otherwise, academics aren't exactly searching for unpublished papers to replicate so the original can be published.


Most science isn't easily reproduced. Often it takes a lot of time and money to get the right equipment. The worst case is super colliders where there is one in the world capable of doing it and so we have to trust the staff. Even simple cases though is still significant effort. There isn't much low hanging you could reproduce it tonight in your basement science left.


The funny thing is that supercollider people try way harder to reproduce it whenever possible than social or nutritional sciences group. e.g. Higgs boson detection was only made public when it was found by two independent teams looking at different data(CMS and ATLAS). Also astronomy is good example where reproduction is the norm.


Yeah, particle physics and astronomy is the height of reproducibility, nothing compares to it. The fact that people can get very antsy when observation misses theory just by 10^-9% shows a great scientific environment


Yeah ok but most science doesn't need a supercollider. Most of the erroneous science is in the social sciences, psych especially, and there really isn't any equivalent of a supercollider in psychology. The only possible equivalent is a massive study of like 200+ subjects but once you're at that scale you can be pretty confident that your statistics have converged, anyway. The real issue is low sample size(<100 although most studies done are <30) psychology studies which are by and large relatively easy to reproduce.


I don’t think all the problems that cause lack of reproducibility come from small sample sizes. For example, P-hacking is a thing (intentionally or not) and larger sample sizes don’t solve that. Experiment registration can help so you can track negative results but that doesn’t help if there’s no reproduction attempt (ie you could just have gotten lucky). There’s also straight up fraud you have to deal with.

The point is, op is right that it’s expensive. The computer industry that we’re in claims to be data driven but I’ve observed numerous poor quality studies being done to drive decisions that I’m pretty jaded (no reproduction, poor sample sizes, skewed sample sizes where it’s employees, etc etc). And these are smart people where the decisions being made can impact the financial outcome.


It very much depends on the field. Sure there are some where reproduction is a massive time and resource overhead.

But we don't even seem to be picking up the low hanging fruit - I used to work as an algorithm researcher, and reproduction there (for anyone who set up their experiments logically) is as easy as running a script and waiting x hours for the results to land. Yet reproduction studies were still a novel concept in that field, rather than a standard part of the submission workflow.


That would also incentivize the paper writers to be exacting in describing their methodology and lead to higher quality papers.

If a paper needed to clear the bars of

- can be reproduced by reading the paper and/or a few meetings with the original team

- data is independently reproducible to within the usual statistical parameters

It would presumably increase the signal:noise of research.


I like this.

Requiring reproduction is several bridges to far.

But requiring reproducibility seems complete reasonable!


If it’s not worth reproducing it’s not worth publishing imo


Until grants explicitly grant money for an independent lab to reproduce the research they are paying for (effectively almost doubling the grant amount), no one is going to spend their hard-won funds on reproducing someone else's research. At least not in a timely manner.

What such a requirement would do is dramatically scale back research, and highly incentivize researchers to lie, or the "independent labs" to fudge in order to keep drawing funding from their collaborators.


> Until grants explicitly grant money for an independent lab to reproduce the research they are paying for (effectively almost doubling the grant amount)

I think a lot of money would be saved by avoiding researchers to waste resources in dead ends, by trying to build upon results that can't be reproduced. So not exactly double.


I completely agree, but unfortunately it's not the same money that is in the initial grant, it's someone else's money. So by granting more money granting agencies can theoretically save some other granting agency's money, but not necessarily their own. Meanwhile that other granting agency will get a lot more bang for their buck (and quantitatively more prestigious results) than the first granting agency.

Bad incentive structures.


I agree we need to allocate at least 1/4 maybe up to 1/3 of all grant money to reproduction of research. While that seems wasteful it’s actually a huge savings and prevents people from wasting energy on false research


Is it a jobs program or are we investing money to get to the truth of things?


Granting agencies, whether government or NGO, typically want certain kinds of applied science results. I think "truth" is more assumed than the explicit goal.


So you just take their word for it? Pretend that reading about it is equivalent to seeing it?


Many scientists reading an article will seek to extend the work in the article, not duplicate it. If the extension fails then they may spend the effort to replicate the original results.

Even reproduction doesn't solve the problem of theoretical errors: https://news.ycombinator.com/item?id=36230450


Reproduction strikes me as the first significant step in extending and debugging. But that's just me.


Engineering and Science go hand in hand, but one is one and the other is the other.


That's the current system.

I suggest an improvement.


Nitpick but "a few meetings with the original team" is what we have today, ya?


Is it even clear that it would be worth it? Or do you propose a drastic reduction in papers to go with it, i.e., only things valuable enough to have someone replicate it are published?


If research is not worth replicating, then how could it be anything but a useless contribution meant only to bolster someone’s publish or perish career?

It just seems that the entire scientific publishing industry is there to support a jobs and prestige program. Any science is just a side effect that somehow justifies the whole racket, from NSF budget to postdoc dinner table.


Not strictly true: it only needs to matter not enough at the time. But, yes, arguably a fair amount of publications are largely irrelevant beyond a few people.

Societies fund many things for many reasons, not sure science is worth singling out here.


I don't mind the funding, but inclusion in a journal dresses it up with the blessing of a Scientific Result if the news ever cite it.

We could have a separate kind of journal for not-yet-reproduced results, and somehow ensure that the prestige is zero (or equivalent to just posting the study on a blog).


We could, but I doubt it would stop the news from picking stuff up. They often go with press releases anyway, not a journal citation. You would need incentives on the news side then to stop that.


Reproducibility in some fields are difficult/impossible for a lot of reasons (funding, candidates, etc).

Suppressing publishing at the same time as actively trashing the ability to even study seems like a recipe for disaster way above publishing bad papers to me.


> Reproducibility in some fields are difficult/impossible for a lot of reasons (funding, candidates, etc)

It's not impossible if we're actually interested in the truth.

> Suppressing publishing at the same time as actively trashing the ability to even study seems like a recipe for disaster way above publishing bad papers to me.

It's not clear that a study that cannot be replicated is worth the paper its printed on in the current environment where replication rates are 50% or less across the board. Other strategies that change how we approach individual studies are not as onerous (preregistration, open data), so maybe in that world individual studies would be worth it, but even in that world replication is the only sure fire way to validate results.


Not being replicated isn't the same as not being replicable. There might simply be no interest at the time. But that is no the same as the results not being right or even valuable at some point.

Replication is not a be all and end all, because things could replicate even though understanding is wrong (i.e., observation is right in the publication but everything else is wrong). Even the replication itself can be wrong, if certain materials share on unknown contaminant, for example. It addresses some issues, but not all.

For some areas it also not clear what replication means. What is it for theology, for example?


> But that is no the same as the results not being right or even valuable at some point.

I can't imagine anyone who truly understands the current replication failure rates thinking that a single non-replicated study is valuable for anything other than informing what replications should be attempted.

Just think about it: replication rate is generally less than 50%. That means you'll have a better chance of determining the truth on any question posed by a study by flipping a coin than by actually reading the study.

> Replication is not a be all and end all, because things could replicate even though understanding is wrong (i.e., observation is right in the publication but everything else is wrong).

The observations are the only things that we have to get right. Interpretations can be subject to decades of debate (some QM debates are ongoing a century later), but if the data isn't reliable then you're just wasting time debating falsehoods. Replications are critical to ensuring we have reliable empirical data.

> Even the replication itself can be wrong, if certain materials share on unknown contaminant, for example.

Indeed, and literally the only way to figure out that such variables exist and are affecting the results are by independent replications. The more replications, the better. One replication is a bare minimum threshold to demonstrate that the process of gathering the data is at least repeatable, in principle.

> For some areas it also not clear what replication means. What is it for theology, for example?

It makes perfect sense in any context where you're gathering empirical data. For instance, if you're surveying people's interpretation of "free will" [1], then the process via which you probe their views should be replicable. This means being clear about the specific phrasing of the questions asked, the environment in which they were asked, the makeup of the cohort, and so on.

[1] https://www.researchgate.net/publication/274892120_Why_Compa...


This is just back to the old problem: most research is irrelevant to anyone beyond a few researchers and largely inconsequential to the world. This means, there will be no money for replication for most things.

Anything truly critical will (eventually) go through some replication/control of sorts (but it can take a long time).

You can either shut down most of research and then place your bets on what to keep and replicate, or you run broad but with a lot of incorrect stuff in it.

If you go for the former, you run the risk that you keep the wrong things, though. You have to have a way to quantify the direct and indirect costs of all the bad research and see if that trades off vs a much smaller research surface. Not sure if that is the case - empirical data matters much less for a lot of big decisions than people often make it out.


> This is just back to the old problem: most research is irrelevant to anyone beyond a few researchers and largely inconsequential to the world. This means, there will be no money for replication for most things.

If it's inconsequential, then wouldn't that money be better spent on replications or other research that is consequential? I'm not really clear on what you're suggesting. Although maybe I wasn't really clear on what I've been suggesting.

Edit: to clarify, there are multiple ways to reorganize research. Consider an approach similar to physics, where there's an informal division between theoreticians and experimentalists. What if we have two different kinds of publications in social sciences, one that's proposing and/or refining experimental designs to correct possible sources of bias, and another type of publication that is publishing the results of conducting experiments that have been proposed. The experimentalists simply read proposals and apply for grants to conduct experiments, and multiple groups can do so completely independently. Conducting the experiment must strictly adhere to the proposed experimental design, no deviations can be permitted as is so common in social science when they find uninteresting results, otherwise this breaks the reliability of the results. A proposal should probably undergo a few rounds of refinement before experimentalists should feel confident in conducting the experiment, but I think the overall approach could work.


> I can't imagine anyone who truly understands the current replication failure rates thinking that a single non-replicated study is valuable for anything other than informing what replications should be attempted.

Sounds like a good idea to have a system of academic publishing that incentivises people to produce replications and similar studies then (and an academic norm of quoting multiple studies that support or oppose hypotheses)

And all that making any research that involves novel research unpublishable until someone else decides to dedicate their time to replicating the experiment from your little known working paper would achieve would be limiting incentives to experiment, especially in fields where it's perfectly possible to publish with statistical reexaminations of existing data (often flawed in other ways) instead.


> I can't imagine anyone who truly understands the current replication failure rates thinking that a single non-replicated study is valuable for anything other than informing what replications should be attempted.

I work in biology. At a panel of biology startup founders I heard one mention that she got a lot of her research ideas from papers studying bacteria which were published nearly a hundred years ago.

In biology you first seek to extend published results. Only if the extension attempt fails would you spend effort trying to replicate it (assuming you just don't abandon the pathway entirely).


And that's why 50% of results in biology fail to replicate. I personally don't find that acceptable. Both of those options should be valued roughly equally IMO.


Whose going to pay for that?


Isn't the main problem that academics are measured by the number of publications they publish, and reproductions of existing studies aren't published by the main journals, thus there is little incentive to try and reproduce findings? I never thought this was a problem of ability.


You're also leaving out the biggest issue. Journals generally don't want to produce negative results. If you spend researching [shocking possibility] and it turns out that [shocking possibility] isn't true, you're not getting published. It motivates everything from HARKing [1] to outright data manipulation. By contrast if negative results were seen as valuable, then none of this is an issue.

On the other hand, it really is the case that there's just not much of any value in learning that [shocking possibility] is, as everybody would naturally expect, indeed not the case. And filling up limited journal space with such discoveries would seem to be counter-productive, at best. And when you have limited space/funding for researchers, one guy who keeps proving everything everybody knows to be false, to be false, is always going to be perceived as less valuable than one making [shocking discovery] [... which ends up being proven false years later].

[1] - https://en.wikipedia.org/wiki/HARKing


> If you spend researching [shocking possibility] and it turns out that [shocking possibility] isn't true, you're not getting published

But this simply isn't true in physics where negative results are very common. This is at least an existence proof that this can work, people just have to get their heads straight on what research means.


By "journal space" you of course mean journal prestige that isn't unlimited. The point of science journals is gatekeeping.


The biggest problem is honestly obtained incorrect results. If you run 1000 experiments across 1000 labs. Few will statistically not notice a mistake and get a wrong result. That wrong result is then published as it is surprising.


I think there are some strong arguments against this. The first is numerical. Fields like social psychology are seeing replication rates as low as the twenties. And not just from low hanging fruit from but from journals like "The Journal of Personality and Social Psychology", which has one of the highest impact factors across all psychology journals, and a 23% replication success rate! [1] This [2] is a Google search for site:nytimes.com "Journal of Personality and Social Psychology". It's interesting seeing how many [shocking discovery]s, many which end up being shared on this site, come from this particular journal.

Furthermore, I think you can often see poorly done science in the papers themselves. They will use suggestive wording in surveys, unreliable sources for sampling such as Amazon Mechanical Turk, and maybe one of the biggest tells is measuring a large number of unnecessary variables. That does very little to further your experiment, but absolutely ensures you can p-hack your way to a statistically significant result. Another is ignoring such patently obvious viable confounding issues, that one can't reasonably appeal to Hanlon's razor.

[1] - https://en.wikipedia.org/wiki/Replication_crisis#In_psycholo...

[2] - https://www.google.com/search?q=site%3Anytimes.com%20%22Jour...


It can also be hard to judge whether replication failed because the result is bogus or replication failed because the replication team is themselves incompetent.


Did they follow the exact same steps claimed to result in something? Did it result in that thing?

The repetition team being incompetent sounds like a cop out. The researcher did a bad job and it’s on them to explain better etc in that case. No excuses, if it can’t be reproduced it isn’t taken seriously no exceptions


You have two groups of people. Either is equally likely to be incompetent.


How do you know which? Both will point fingers at the other.


So just make sure someone unaffiliated has to be able to reproduce whatever research has been conducted. Tough luck if it doesn’t make it, that’s why you do your best to ensure you’ve verified it’s a real result.


Unfortunately this is only part of the problem. Even studies on ML that use public datasets, which are the kinds of studies that when code is shared should be very easy to reproduce, are often surprisingly hard to repeat. Sometimes only parts of the code are published, the code has a lot of bugs (who knows why? Added intentionally?), the code is very badly documented, or the exact libraries are not specified properly.

And this is in a field where everything is based on code, where in principle reproducibility is easy. Go into materials science or chemistry and try to synthesize something following a published paper and you get all sorts of problems. Different equipment, different temperature, not all steps documented, ... Reproducing experimental findings can take you months.


It still largely comes down to incentives from what I've seen. A lot of times all anyone (from the researcher to the reviewer) cares about is the paper. Journals don't check that code actually works, and a lot of researchers don't spend time on preparing their code. They feel there's no need, since they now got a new article on their CV. It's true that they may not have the skills and experience to produce good code they can share (depending on the area), but often 1) there's no time to prep code since they've got 3 other projects going on and a crazy work pace 2) the code is seen as something incidental and secondary - what matters to them is the figures and results 3) some groups want to milk a topic for a few papers so they're guarding their code and data. Luckily at least plenty of journals demand access to data or even making it public.


In fact, there's even more incentives for researchers to make reproducing their work as hard as possible. For example, what if someone tried to reproduce it and found contradictory results? In both cases (reproducer made mistake, original made mistake) it's additional hassle that the original authors can basically only suffer and never gain.


This is just you confirming that tons of research is essentially fraudulent. If it can be contradicted it absolutely should be, that is how fields progress and weed out bad ideas.


Page limits certainly don't help!


Another issue is that making things reproducible costs you time and that is exactly what most researchers do not have. For example, many ML papers have code that is just a barely working Jupyter notebook. To make it reproducible you would have to create a reproducible environment, package the data, and prepare scripts that would rerun all the experiments you have done. That can take several weeks, but it will not increase the chance of acceptance for your paper at all.


More precisely, making things reproducible after the fact costs you significant time - there are tools for reproducible setups that take maybe an hour (at most) to setup upfront, after which it takes very little effort to do your work within that framework and keep things reproducible (for eg. Julia has DrWatson, DataDeps, etc., I'd be surprised if Python doesn't have equivalents).

The problem is knowing upfront which of your work would need to be reproducible, or having the discipline to do all your hacking starting from such reproducible setups.


But Julia and Python tools aren't enough. The whole environment has to be reproducible. So many python libraries themselves take shortcuts which work on the current Ubuntu or current state of the web, but will fail to build later by the time someone tries to reproduce the result. Shipping a container just hides the implicit dependencies and assumptions. People need to be packaging for Guix en masse for reproducibility to be feasible. Until then, "reproducibility" is just another lie people are telling themselves and others to try and get ahead in their rat races.


So you say "julia and python tools aren't enough" but then proceed to only talk about Python and say a bunch of stuff that is completely inapplicable the Julia.

Do you know much about how reproducibility is approached in Julia? Maybe hold off on calling it a lie if you're not experienced in what you're talking about.


I have asked about Julia's reproducibility story on the Guix mailing list in the past, and at the time Simon Tournier didn't think it was promising. I seem to recall Julia itself didnt have a reproducible build. All I know now is that github issue is still not closed.

https://github.com/JuliaLang/julia/issues/34753


"reproducible build" in this sense has nothing to do with scientific reproducibility. That issue is about hash-verifiability for the sake of security, and how some autogenerated random paths included in the binary affects that.

Scientific reproducibility requires only that versioned binaries be functionally equivalent if they have the same version, which is quite independent of this and certainly exists in Julia.

Would love a link to the Guix mailing list discussion, if you can dig it up.


I agree with your first sentence, but saying people are fooling themselves and being overoptimistic (by telling themselves lies) is very different from "calling it a lie" (i.e. intentionally deceiving others). That seems like an unnecessarily negative interpretation of what they said. Even if you disagree with it, that does not deserve such a harsh response.


Maybe the cause is funding sources that fund researchers publishing too often, and not funding other researchers to double check their work


No. Several weeks is the time it takes to learn and master Docker.

About two hours is the cumulative time one must cater to the Dockerfile for a 3 weeks project.

But it requires institution insisting on reproducibility, and fostering best practices to make it even easier for the researchers to be compliant.

I get it that reproducibility can be quite hard for biology. But ML cannot be taken as an example of a hard problem.


I agree that docker is great. But docker solves only one of the problems mentioned above (env) and even that solution does not work for some teams that run their experiments in GPU clusters where docker is not supported.


Perhaps the core issue is that academia excels at being a textbook case of goodhart's law. If/when reproducibility became a target then the academic system would/will likely make an equally bad mess as it has with its current targets.


If you fail to reproduce some important research then I think that would absolutely get published. (see the recent superconductor drama)

So if you feel some impactful work is suspicious .. I think disproving it would absolutely be incentivised

If you show its actually correct.. Well then usually it's not that hard to push the envelope a bit further and say something new. That happens all the time


Yes, but in the vast majority of cases, it's hard to tell just by reading a paper if there's been dishonesty somewhere in the pipeline.

Also, the LK-99 example is an exception, not the norm–the chances of receiving significant attention for a replication study are near zero in almost all other cases.


I just don't think it's really relevant. If the research is impactful, then it'll be replicated (at least in part) when the next person tries to build on the results. If it doesn't replicate then they'll probably end up discovering something new/different - and that'll lead to it's own paper.

Even in the ideal world, you effectively almost never end up with a replication paper. Either it replicates and you add on your own novel research. Or it doesn't replicate and you discover something new

You can in theory end up with a super dull null result that disproves someone else's claim. But even then, when you set out on the project you're aim is to add something new on top of what's been already done. This happens all the time


It seems to me, instead of funding a new college or traditional research institute, some benefactor ought to fund a "research reproduction institute", dedicated to identifying and reproducing suspicious publications.


Sort of. Yes and no. There has to be a metric to assess researcher's performance. Otherwise we won't know what research is worthwhile. When the rules of the game are known, players will find their way to cheat, or at least bend the rules to their advantage.

So, for example, suppose negative results become as valuable: well, they are easier to produce. They are also less valuable as stepping stones for further research. Given that, you'd still need to have a metric that compares publishing positive results to negative results. Even if you declare them to be equally important, the shared understanding will be that they aren't. And one would be more important than the other. And here were are back to square one.

There are some minor things that can be done in the near future. For example, results produced with code must come with the code that produced these results. A lot of research bodies resist this because they want to commercialize their code, or their code may inadvertently contain organization's secrets and therefore needs more auditing... but, in the end of the day, it needs to be made clear that this is a necessary and unavoidable price to pay.

Data sharing is even more problematic. Beside confidentiality concerns, data is always a bargaining chip in the game of getting collaborators (and grants). Should it be made public, it loses its value to those who collected it. Right now, the trend is: if you managed to collect a worthwhile dataset, then you'll cover yourself foot to head with NDAs, contracts of all kinds etc, and will sit on it, exploiting it for a series of research. And if anyone wants to do research on the same subject, you will only invite them if they bring grants or equipment etc.

But you cannot really verify results w/o having the data available. Even if you have the code.

---

It's really sad to see how research is doing wrt' programming in part because of the above, but I don't think the programs outlined in OP will have a noticeable effect. They don't paint a convincing picture in terms of incentives, i.e. they don't answer the question why would researches want to do any of that RepRes and OS training. Even in computationally-heavy research today you often find that all the computation work is outsourced by the researchers and they themselves have no clue what their code is doing.

Above were all sorts of arguments for why the current (or yours) approaches are ineffective. But I don't claim to know what needs to be done.


I don't know if we will get a healthy research community by attempting to police unhealthy communities out of existence. This will only lead to research activity being further stifled by useless bureaucracy that fails to restrain most bad research while slowing down what actually matters, quality research. Instead, I think we need to focus our attention on outstanding communities and how to distinguish them from less outstanding ones.

We don't need to be threatened by the mere existence of fake journals that publish articles with "counterfeit consciousness" in them. No one actually reads that stuff. It only exists to feed badly-managed communities with terrible incentives. And the solution to such bad communities is to create good ones and showcase their work. Not waste our time obsessing over the fear that somewhere, someone is not a great researcher.

Poor research is the norm and should be ignored. Good research is the exception that should be recognized and nurtured.


I'm a little confused by your post. This paper is about increasing training about reproducible research and open science at all levels of education, with the goal that awareness of them will increase participation and reduce the incidence of pitfalls that lead to faulty research. Is this not exactly showing people how to conduct "good research that should be recognized and nurtured"? Is this really any different than requiring training in proper statistical methods?


Nevermind, it looks like I got mixed up responding to a different comment or article. Sounds dumb but it has happened to me a couple times over the years...


Here's a thought: Teach it in elementary school, so that it becomes a habit, and benefits those who are not training to become professional scientists. It doesn't even need statistics or "code" since reproducible science predates those things.

If it becomes a habit, and a natural expectation, then teaching the specifics pertaining to any specific field at an advanced level should be easier. And maybe it would encourage the public to demand better science reporting.

Also, mark the topics in the introductory textbooks that are not based on reproducible research, so that students know the actual state of the field that they're studying.

I do industrial R&D, and my stuff doesn't get published at all, but I benefit from doing work that is at least "open and reproducible" within my organization. It actually improves the quality of my work.


No. Teaching it in elementary school doesn't do shit. Everybody knows this stuff.

The reason why a lot of people don't make reproducible research is the same reason why communism tends not to work.

In order to make it work you need to put incentives in place. For example. No journal publishes a paper unless the work has been reproduced by someone from a different institution.


Idea: create a journal that is GitHub based.

It generates any data plots/tables through CI, which means you need to have your data available and consumable, and containerised code. Its PR reviewers are scientists doing peer review.

Then you get public review and comment, reproducible environments, and open data.


I like the idea a lot. I don't expect a lot of feedback/reviews from other scientists, there are just not that many and they are likely busy with something else


You could still have assigned reviewers like in a conventional journal, which maybe if done right could stimulate broader discussion

JOSS has a review process like this, but I don’t think I’ve heard of any journals focused on primary scientific results using this kind of GitHub based review process

https://joss.readthedocs.io/en/latest/submitting.html#the-re...


Thanks! I was thinking an actual journal that would have a subscription.

I wonder if it could also get people to sponsor replication studies, and publish them linked alongside.


I was recently interested in road traffic speed prediction.

The problem with academic research into predicting speeds is not as much the lack of reproducibility, but the lack of measures that make sense.

Hundreds of papers compare a short-term prediction (say 30 minutes ahead) using mean average error, or similar measures. This completely ignores the fact that vehicle speeds are mostly constant and only vary significantly during rush hours or other disruptions, at which time the process of breakdown is rather chaotic.

From what I gather this will not change easily, because it is considerable academic fun to improve on a number for existing benchmarks, instead of redefining the game.

I'd love to help academics in this field who need some guidance on practical applications!


it's not entirely important (theoretically) that research be reproducible, but it would be important that the attempt at reproduction is made in a timely way. In many cases today, it's not so easy to set up the experiments in the first place to attempt it.


> it's not entirely important (theoretically) that research be reproducible

Well that's confusing, because reproducibility is a core tenet of science. How else could you tell whether your results were a statistical fluke or due to some flaw in the study design?


12. Create a journal for reproduced papers only.



We need a publicly funded body whose purpose is to issue grants for firms to reproduce scientific findings. A scientific reproduction corps. Extra can be awarded for finding flawed/fraudulent research. This would create a community of organizations that counter-balance industry incentives and instill trust in our scientific process.


The important thing to change would be that the consumers of research need to care about reproducibility. Currently, that is only the case in very few instances and there reproducibility is typically checked quickly.


the root of the issue is trust in published science. if people focus too much on reproducibility, then you will get people trying to game the system to publish reproducible but still misleading/cherrypicked science


> An alternative approach is to encourage students tnduct replication studies, evidence synthesis, or meta-research as part of graduate theses.

Why not require it?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: