Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Unfortunately this is only part of the problem. Even studies on ML that use public datasets, which are the kinds of studies that when code is shared should be very easy to reproduce, are often surprisingly hard to repeat. Sometimes only parts of the code are published, the code has a lot of bugs (who knows why? Added intentionally?), the code is very badly documented, or the exact libraries are not specified properly.

And this is in a field where everything is based on code, where in principle reproducibility is easy. Go into materials science or chemistry and try to synthesize something following a published paper and you get all sorts of problems. Different equipment, different temperature, not all steps documented, ... Reproducing experimental findings can take you months.



It still largely comes down to incentives from what I've seen. A lot of times all anyone (from the researcher to the reviewer) cares about is the paper. Journals don't check that code actually works, and a lot of researchers don't spend time on preparing their code. They feel there's no need, since they now got a new article on their CV. It's true that they may not have the skills and experience to produce good code they can share (depending on the area), but often 1) there's no time to prep code since they've got 3 other projects going on and a crazy work pace 2) the code is seen as something incidental and secondary - what matters to them is the figures and results 3) some groups want to milk a topic for a few papers so they're guarding their code and data. Luckily at least plenty of journals demand access to data or even making it public.


In fact, there's even more incentives for researchers to make reproducing their work as hard as possible. For example, what if someone tried to reproduce it and found contradictory results? In both cases (reproducer made mistake, original made mistake) it's additional hassle that the original authors can basically only suffer and never gain.


This is just you confirming that tons of research is essentially fraudulent. If it can be contradicted it absolutely should be, that is how fields progress and weed out bad ideas.


Page limits certainly don't help!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: