In my experience it's not that there's better quality results found, but rather low quality results are skipped. There's a lot filtered by default + it's easy to click "block this domain" when you run into yet another stackoverflow copy. It means that when you're searching for code related things, you often get small relevant blogs in position 3+ rather than SEO spam.
For example searching for "current time on JavaScript" on Google, I get SO, MDN, and basically a lot of SEO spam sites. Same thing on Kagi https://kagi.com/search?q=current+time+in+JavaScript&r=au&sh... ends with an actually interesting blog on position 5, link to moment.js on GH, further down posts about accuracy and about the Temporal API proposal, etc.
Might these stats be biased by the demographics that use Kagi? I don't use Pinterest, and never really encounter it on a daily basis, but some of my friends really like the site. Tbf, they might use stackoverflow more, but seeing Hacker News on the top 5 doesn't seem to reflect the average web user's usage...
I used pinterest and still hate(d) when it popped up on search.
Without too much conjecture I think the problem is search-related web crawlers and users have very different experiences with pinterest. To the web crawler, the information is there and easily accesible. To a user, it might be behind a login, or part of an image description, or a sidenote, or whatever. The page doesn't exactly load with what you are looking for front and center
Additionally, I don't use it through a browser, I use it in an app, so im not logged in on my browser.
I only recently discovered pinterest as a useful site, but only because a friend convinced me to create an account. It only becomes usable with an account and it even makes fun, but most people are probably like I was and don’t want to create an account and are annoyed of pinterest hijacking the „save image“ function, redirecting you when you don’t have an account and nagging you with a login wall. It really only becomes great if you give in (only took me 5 years or so).
There was a time where Google image search was dominated by Pinterest results but clinking through never took you to the photo. And most of the photos were rehosting of the original so you wouldn't get that either.
I wonder if it is also biased in the sense that it's only certain people who customise these things. And that it's only the most common ones. You would never see my favourite car forum listed here, but bumping it in search results is where I see value with the feature.
Of course they are biased. Just look at how NYTimes is both blocked and upvoted on Kagi. Some people like a source and others don’t, it’s that simple really.
It's telling that the top sites on that list accurately map to companies that have been the most "successful" at blitzing the incentive structures of the current internet economics model.
I don't understand why people hate Pinterest in image search results. If the image is relevant to my search then I don't care who's hosting it. Can somebody explain to me what the problem is here? https://0x0.st/HO57.webm
(I don't have a pinterest account, if that makes a difference.)
I don't like pinterest because the images have no metadata. If I see something I'm interested in--like a piece of furniture--I have no way of knowing how to get more info on it.
Same thing with those displays that rotate earthporn with no info on the location. So annoying to see spectacular things and not know what they are.
I wouldn't say that's the problem I've observed. around 2015 or so I remember consistent useful Quora answers from Quora. it was Yahoo answers but better because people could justify with qualifications. I actually had an account briefly
then I'm not exactly sure when, but definitely by 2018, every answer I ever see on Google is either incorrect, answering the wrong question, or in broken English. commonly all three at once
I don't recall ever seeing a product placement, although admittedly I stopped clicking a long time ago
my educated guess is that due to some likely seo-related concern they deleted/archived/unlisted their old good quality answers in favour of newer more seo-friendly answers
it may not even be their fault, it's possible that Google's algorithm just doesn't drag up older answers from their website, but given my experience with the decision-making in the brief time I was a user there (e.g. removing the ability to add a description to questions) I suspect not
if this was the main cause, the expected result would be lots of highly-upvoted bad answers, as opposed to lots of scarcely-upvoted bad answers that somehow rank highly in Google searches
Quora has plenty of potential for making consistent profit without whoring itself out, but consistent profit isn't enough in the post-Freedman world that we live in
I had to quit doing so, because I discovered that it didn't just exclude listed domains, but performed a totally different search. Locations or local results were largely missing, when I excluded some domains.
That's curious, please expand on this. Do you really mean it performed a search for different thing? If so - have you figured how it differed?
Or did it just have to perform a non-cached search and thus not only excluded said domains, but could also reorder the other results based on the current relevance, rather than cached relevance from the past that is being served to everyone else who doesn't exclude said domains?
When I search for “pizza hut”, I get: the info panel that shows Wikipedia intro and company social media profiles on the right, locations results and integrated map view as the second result. When I search for “pizza hut -site:pinterest.com”, I get none of those. In addition, results are listed in a different order.
Due to linkrot, a lot of images that stood on independent websites, are now only available on Pinterest, which scraped and cached them before the original site went dark. Clicking through to the original links these days often leads to 404s.
This seems the most obvious value add feature for search results at both an individual level and for reviewing overall moderation.
I wonder what possible logic there could be to not allow it? The only one I can think of is they don't want bridgading to create a wider system block but that seems easily enough to resolve.
Eventually someone was going to create an easy to list/share/subscribe list that individuals could easily add to their personal Google domain block list. Think EasyList.
At that point they would be bleeding ad revenue as all the nasty, fake, abusive, spammy websites would be insta blocked.
Imagine being able to add a list and all of a sudden half the SEO blogs are excluded from results. Assuming Google even allows it, they would then have to work even harder to find relevant content to your search query. They can't rely on throwing a huge wall of semi-relevant results that you have to wade through, generating ad impressions as you go along.
Counterpoint: That feature has very little utility to all but a tiny fraction of users. Those users can readily find other means (e.g. extensions) to achieve the same thing. In the interest of simplicity, it was the right call to remove this. I imagine it was pitched for its ability to gather feedback on search quality, but the type of people using the feature aren't representative.
> when you run into yet another stackoverflow copy
OMG. Why doesn't Google filter out the likes of geeksforgeeks for instance? How is it possible that it always come before the genuine SO answer?
Even without offering the possibility to filter out a domain (which they had, and later removed), how does the ranking algorithm not see those horrible, zero value clones??
I can't tell you what they are, but there are probably internal Google incentives to filter and internal Google incentives to not filter, and the ones to not filter are probably stronger.
My theory is that google went from ads in search results to ads on visited pages. By buying doubleclick etc they are suddenly incentivised to drive traffic to ad-supported websites.
Almost all the interesting factual websites are not ad-monetized. The SO spam etc are all scraps of the factual websites with ads injected. If google simply deprioritized ad-supported websites the search results would be much cleaner, but the part of google that sells the ads on sites instead of in search results would throw a fit.
We could test this. Take a few hundred search queries, strip the pages that display Google ads, and see if the remainder of the search result is better or worse.
We'd need to get some humans in to rank the results, but that's not a big problem. "How well does this web page answer this query, on a scale of 1-10?"
With a collection of ranked pages, we can answer other questions as well. I'd be interested in running the same test but for google analytics, not google ads, as I think there might be a misaligned incentive there too.
It's worth bearing in mind that the stackoverflow clones may actually answer the query just as well as the original site - that is, it might be our definition of "a good result" that's out of whack (because we have an unnecessary bias towards the original source). I doubt this, but again it's something that's testable.
I don't doubt it, but obviously something's going wrong between the human-generated training data and the SERP, else why are we getting utter crap back?
(Or, as I said, it's our idea of what constitutes a good result that's wrong).
But the same websites show up in e.g. DDG (through Bing), as far as I know neither DDG nor Microsoft make a dime from ad-supported websites like Google would, why are these results not nuked similarly to what Kagi is doing?
Aha. Couldn't help but scratch my own itch. I wonder if DDG has a deal with Google where they get a cut of the ad profit if they are mentioned as a `ref` in the doubleclick ad request.
:path: /pagead/viewthroughconversion/796001856/?random=1695374589838&cv=11&fst=1695374589838&bg=ffffff&guid=ON&async=1>m=45be39k0&u_w=2704&u_h=1756&url=https%3A%2F%2Fwww.geeksforgeeks.org%2Fc-plus-plus%2F
&ref=https%3A%2F%2Fduckduckgo.com%2F. <<<< What does this do?
&hn=www.googleadservices.com&frm=0&tiba=C%2B%2B%20Programming%20Language%20-%20GeeksforGeeks&auid=68284397.1695374483&data=event%3Dgtag.config&rfmt=3&fmt=4
Hence providing the same incentives to keep shitty sites like geeksforgeeks in the results.
I guess also geeksforgeeks is incentivized to report these references, so that search engines and other linking services will continue to show their links.
To reproduce:
1. Go to duckduckgo.com and do a search that will turn up a geekforgeeks website
2. click on the link
3. watch the network tab as requests are made to googleads.g.doubleclick.net and check the path.
Most other search engines train with a target of google or with some form of reward which is bootstrapped on google rankings. It makes Bing results implicitly have the same behavior as Google. DDG and others just use BingAPI so googles incentives pass on through.
That doesnt make much sense to me. Google's interests are not microsoft's or DDG's interests and to hold up Google as some sort of ground truth in what the optimal search results for a given query are is, as proven by Kagi, highly deluded and also quite subjective.
If true however, it does go to show that Google is really a monopolist in the search space as well... and to substantiate this claim would go a long way into proving that.
Adblockers are not a defense against this, as those results are genuine search results.
I run uBlock origin (of course), am extremely aware that geeksforgeeks exist and is utter shit, and yet I get fooled now and again, which makes me very angry at that website, Google, myself, and the world in general...
If I ran a seal-clubbing business I'd have to club seals to make money.
The whole argument is that those sites don't exist to provide a good service yet sadly need to show ads to keep the lights on.
I’m just wondering to whom those ads get shown… not arguing that anyone should turn off their adblocker and keep them running
They are working hard to trick people into clicking on their links, but won’t most people who click those links be running an ad blocker? Are unsophisticated web users searching for questions answered on stack overflow?
This is my experience as well. Low quality junk is often not present, and if it does show up, it's two mouse clicks to never see that domain again.
Also the ability to promote high quality domains helps even more with this (though i have found one needs to be careful with pinning domains, as it can lead to irrelevant results being shown first because they have some if the same keywords).
I never got why these even ever appear in Google search results (or any search results, really). It feels like it would be super trivial to identify sites that are scraped copies of other sites. Granted, without foreknowledge, the engine doesn't know which is the original. But at the very least this can be determined by a human once, and then the problem goes away forever for that particular site.
Funny that you mention this game. bg3.wiki, the community wiki had a lot of troubles with SEO. It got ignored or pushed down in the search results for a very long time, while the awful Fextralife wiki that includes a Twitch view botting iframe on every pages was always first.
At this point it's just safer to treat any content newer than about a year ago as highly suspect. Bots and fake content have been around for years, but things changed when ChatGPT and the copycats went live.
The blue ribbon chef was said to be the cream of the cream, so the restaurant owner was happy for him to have white card over the place. He arranged an outside the work of fatty liver, a main course of rooster of wine with eat all, and as the blow of mercy: burned cream; the full menu was a feat of strength! He made sure to wish the diners good appetite. However, when the owner visited from her foot on the ground she turned into a terrible child and demanded mouth amusers and crescents. She hated the decorative objects of art made of chewed paper.
(When we steal from French, we don't translate it to English, it becomes English).
It's googles fault. They are the ones who make this a viable business model. They pay the ads, and they pollute their search results with this garbage.
100% Google who are destroying this part of the internet.
How can the search engine not able to tell who the original is? Originals always exist earlier, not to mention SO.com domain rank is way higher than those spammed sites that existed for less years.
Is this after you've done a lot of blocking (or other customisation)? For me the top Kagi results are mostly similar to the Google ones, and when I scroll down a bit Kagi doesn't save me from articles with openings like
> Time is an important part of our life and we cannot avoid it. In our daily routine, we need to know the current date or time frequently.
SO, MDM are the good results together with the blog Kagi gave in 5th place. In google you got the first two then a lot of spam and not the other good results.
That's still an example of better quality results that should be quantifiable, that's ranking. We have things like precision@n/NDCG@n/etc. where it should be straight forward to show a metric for some smaller n where Kagi beats Google since it doesn't show some set of irrelebant/low quality results interspersed.
I get those in Google as well. But tbh, I don't care. If I'm looking for "current time in JavaScript", I don't care if the answer comes from stackoverflow or any of it's clones. It's not like I want to interact with that site somehow. I just want answers. If I want interaction, I obviously go to stackoverflow directly.
It might matter that I'm using Ad-blockers, so maybe if I didn't, those sites would feed me obnoxious popups and malware, but as it stands, I don't see any difference...
I just did exactly this search on Google. The first result was this -[0] which is exactly spot on. Not sure if it is because I use Brave browser which also blocks ads on websites.
The result on Google is indeed correct, but I was posting a trivial example that was supposed to show the variety of answers/sources not the accuracy of the top one. For that, they're fairly similar, although Kagi seems to prefer the higher signal-to-crap ratio.
Wow. I just loaded it and then turned off the adblocker and reloaded it. It's like you need another search engine just to find the content in the page hidden amongst all those ads.
I can't believe some people actually use the internet like that all the time.
Yes people are. And it’s the least technically literate people with “outdated” machines and bad connections that slug through the web like this. They don’t know what to trust and often fall prey to deceptive tactics.
My goodness, I thought you were exaggerating. I've been using ad blockers for so long, I forgot the web had this many ads. Or has it just gotten worse over time?
Well, I still remember times when 50% of google results weren’t ads.
Interestingly, Bing almost doesn’t display search ads, and the search results are becoming even better than Google. I haven’t had a need to use google for a few months now.
I wonder if adblockers have contributed to this. In theory we users can reward non-horrible advertisers by whitelisting their ads, but in practice we tend to block as much as possible. The remaining ad-viewing audience will be partly composed of people who are ethically opposed to adblocking or are held back by a lack of tech knowledge, but it will also be relatively insensitive to ads (both in the sense of being able to put up with a lot, and in the sense of requiring a lot to attract their attention).
How can you be ethically opposed to something that ruins your experience? It’s obviously their choice, I just can’t imaging browsing without adblock, I’m ethically opposed to the pages filled with crap I guess.
Those who go through the effort to make good web content, and who pay the costs for a web server deserve to be paid. So ethically I should not block ads.
I block them anyway because ads also have an ethical contract with me that they have broken. They need to not take up too many resources on my computer, not make noise when the website otherwise has no noise content, not install malware, and be for legitimate products not scams. Probably more as well, but the above are things I regularly caught ads doing before I got an ad blocker.
If Chrome, Edge, Safari all came with uBlock by default, what percentage do you think would be "ethically opposed" enough to disable the extension? How many would turn it right back on?
I think it depends on the site. I remember early 00s where many download sites would have ads with a download button, or pop-ups that blasted sound like YOU JUST WON. Now I think that sorta thing has been normalized to even non shady sites. My primary use of ad blocker is so that I don't get random autoplaying videos.
And by that, you are telling google you liked that result (by clicking on it), even if in the end ads revenue is not increased by your visit.
Maybe Google consider less important signals coming from ad-blocking browsers, that I don't know.
But with an extension I can have a personal garbage block list or hide/collapse website preview without removing the result completely that works on other search engines
Btw, can you hide text preview on Kagi instead of removing the domain completely (in case you're not certain the website is garbage and sometimes want to check the results, but just want them less visible)?
Maybe for you that is better, but I want word negation to work, and "verbose" to actually require words I specify to be on the page. Word stemming would be good as an option.
Sure, I'd like Booleans to work again, and intitle:.
That said, Google could probably make an inferred search interpretation work well if they wanted to return results that were good for the user rather than return results that optimise their ad revenue.
Why stop there? The best mind-reading search engine is one that doesn't even let me type queries, it tells me what I need to know before I even know I need to know it. The fact that all search engines still have query fields tells me they all still suck at reading my mind.
For example searching for "current time on JavaScript" on Google, I get SO, MDN, and basically a lot of SEO spam sites. Same thing on Kagi https://kagi.com/search?q=current+time+in+JavaScript&r=au&sh... ends with an actually interesting blog on position 5, link to moment.js on GH, further down posts about accuracy and about the Temporal API proposal, etc.