Sorry, but it's very easy to verify that these claims are crap by replicating their study, i.e. doing a simple blast search of the insert sequence against the virus database. Here's the result of the first insert: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=398N4CS...
Although, there are some hits against HIV, there are also equally matching hits against bacteriophages; viruses that only target bacterias, they are completely unrelated to any viruses that target humans and animals.
Furthermore, the E value is around 170, that means that matches are statistically completely insignificant, meaning they happened by chance only. Such a high E value corresponds to a p-value of very, very close to 1 (https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html).
These guys that published such a paper are either completely clueless or nefarious in trying to stir up conspiracy theories.
Read the paper, they actually only match 2 inserts, the other two inserts are modified by the authors in such a way that they are made to match (Table 1).
Both inserts 1 and 2 also match to Streptococcus phage, but a bacteriophage would of course not be such a bold claim as HIV matches are.
Also, be aware that because of the scientific interest in HIV, there are hundreds of HIV strains sequenced, a virus known for its mutation rate (especially in these two proteins gp120 and gag, as they are under pressure to mutate in order to evade the immunesystem). So in such a large library of protein sequences one is bound to find a match of a short 6 letter (amino acid) sequence. That's why E values exist to make a statement about the statistical significance.
Large HIV database inflating matches is indeed a big concern. But dismissing one miss matches sounds arbitrary: these segments were not arbitrarily selected, but real insertions on tops of sars.
Large (but <100) evalues are sometimes considered as weak evidence of some evolutionary process if you are querying against a huge database with closely related sequences. However, given the length of their first 2 hits I'd tend to think this is by random chance. The last 2 are more interesting.
And the fact that there's no known CoV with any of these inserts is quite intriguing.
The last two (look at table 1) are interesting in such a way that its almost scientific misconduct akin to photoshopping a picture in a scientific paper.
They blasted the inserts, but apparently couldn't find any matches to HIV, so they just changed them until they found something.
>These guys that published such a paper are either completely clueless or nefarious in trying to stir up conspiracy theories.
In your opinion, do you think this kind of subterfuge could have been picked up by Dr. Eric Feigl-Ding or someone with a similar calibre, before broadcasting this preprint for wider consumption? Thanks.
I have not heard from this guy before (I work in microbial genomics, he seems to be in the field of health economics from a quick search), so I can't comment on that.
Occam's razor says no. The coronavirus spike protein is responsible for receptor binding and entry into the cell. Different strains with different hosts bind to different receptors, so they have differences in their spike protein sequences. Mutations in the spike protein are expected in the evolution of coronavirus.
Although, there are some hits against HIV, there are also equally matching hits against bacteriophages; viruses that only target bacterias, they are completely unrelated to any viruses that target humans and animals. Furthermore, the E value is around 170, that means that matches are statistically completely insignificant, meaning they happened by chance only. Such a high E value corresponds to a p-value of very, very close to 1 (https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html).
These guys that published such a paper are either completely clueless or nefarious in trying to stir up conspiracy theories.