Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hi, I'm lead author of the study, so I can give some background on this. This is certainly valid criticism and we could have addressed our reasoning for using Startpage in more detail, but it just didn't fit in the paper anymore.

The reason we used Startpage is simply that it's much easier to scrape. We started off checking only Startpage (as proxy for Google) and DuckDuckGo (as proxy for Bing), since they are both simple to scrape and produce stable rankings. Plain Bing, on the other hand, is a lot trickier. You often get different results for the same query and you get blocked much more easily if you send too many. The only stable way to use Bing is via the (certainly not very cheap) API, though even that wouldn't necessarily guarantee the same user experience as the web frontend.

We did notice, however, that DDG (despite being mostly Bing) did deviate quite a bit in their results, so we started scraping Bing as well for a fairer comparison. As for Startpage, we did check it initially and we found the results to be virtually identical, except for a few minor rank differences here and there (which are probably just geo personalisation). The differences may have become larger now that Startpage also taps Bing to a certain amount, though when I do spot checks, the results are still sufficiently similar and they are also sufficiently different from what actual Bing gives you. Most prominently, Startpage/Google give you a lot more YouTube results, which we did another small spin-off study on (to be published at CHIIR this year https://downloads.webis.de/publications/papers/bevendorff_20...). Moreover, we could also measure certain immediate effects of Google's ranker updates in Startpage, which weren't as apparent in Bing. So we are confident that Startpage is a reasonable proxy for Google, though it's certainly something we will keep in mind for follow-up studies.



Hey,

I'm sure that using these sites was the thing for your research for all kinds of practical reasons, and you don't need to justify that :) My actual complaint was about the misleading title, which you did not address.

Note that the clickbait title actually hijacked your research; there's like three comments out of 240 here that are about your paper. So while it probably feels good to have received all this attention, it's just bogus engagement. I guess it's pretty meta for a paper about search result spam to do blackhat social media engagement optimization though.


I wouldn't say it's misleading or wrong. It's certainly a bit catchy, but it also aligns with a greater discussion that's going on at the moment. Regardless of the title, most people probably wouldn't or couldn't discuss scientific literature in-depth anyway, but search engine spam is something many can relate to, be it from Google or otherwise. It's certainly a topic we will be hearing about a lot more in the future. Besides, we didn't even anticipate this to be picked up by non-scientific journals that quickly, though the title may have helped. ;-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: