Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google, please do something with your ads and SEO-spam (mdubakov.medium.com)
126 points by tablet on Nov 25, 2022 | hide | past | favorite | 94 comments


What really ticks me off are all these sites clearly scraping content from Stack Overflow that somehow end up near the top of the search results. How is there not an automated fix for this. Lemme help you out with this Googlers: If site content A == site content B, pagerank = -1.


They have a fix already and applied it to the Wikipedia clones. They intentionally ignore other content scrapers because they drive ad impressions.


> because they drive ad impressions.

Could you explain why?

If Google replaces SEO trash with decent results, the ads they will show on the Search page will remain unchanged. How would the impressions suffer? Or rather, since Google charges per click not per impression for search ads, how would ad clicks suffer?

In fact, I thought low-quality organic search results are purely a negative for Google profits, since users (over a very long run) move their search elsewhere (Tiktok, Reddit, etc.).


It's about the ads on the SEO trash pages. Those pages are typically loaded with ads which increases the likelihood you click on them.


I wish google had a feature to permanently ban domains from search results.


They had a personal block list at one point... but removed it.


There are browser add-ons that can do this (for example, uBlacklist).


Kagi has that feature.


Why not just create your own search box and put -site:domain.xyz in the query? Maybe there is a length limit but at least you can ban some


There is a chrome addon called uBlacklist that auto-hides domains (configurable) in the google search results.



Cue a HN post “wtf google, my archive website is getting marked as spam.”


I don't want your archive website near the top of my search results when the original is readily accessible.


Won't stop the outrage post.


The actual outrage should be when a small site posts something and then that content gets stolen and posted verbatim as an answer on stackoverflow.

There's even the possibility that the Google crawler will have seen the SO page before discovering the original small site.


The suggestion is not to penalize all websites that share some minor amount of content with SO but to remove websites that copy tens of thousands of comments from SO and do not provide any other value. It’s a pretty easy to identify and implement this distinction.


You have any examples of that? Sounds far-fetched to me, since SO posts:

(a) are in response to very specific questions

(b) can be easily reported to SO for plagiarism

(c) make no $$ for the poster, reducing the incentive to plagiarize


You misunderstand the parent comment.

They’re saying there are full sites which just scrape SO content. These are not hosted in StackExchange and therefore are not subject to their reporting system, and generally are filled with ads which provide the money and therefore incentive.


> You misunderstand the parent comment.

I don’t think so. The parent seems to talk about StackOverflow answers plagiarizing some small blogs, not StackOverflow itself being scraped.

Not an issue I’ve witnessed though.


I see the parent comment clearly stating small blogs lifting content from SO and SEO-optimising it.

This happens a lot. For instance, a Google search for “maps vs list comprehensions” shows an ad-based website result featured at the top, despite having re-hashed content from the SO post on the same topic.

Clearly, Google’s SEO favors recently-published articles over quality.

I ran a search “react node.js”, and a digital marketing blog post is highlighted in the “Featured by Google” despite being low-quality. So, it’s just spam at the moment to milk views by content-marketers.

Also, it’s hard to plagarise from small blogs when SO is the more active forum for day-to-day solutions. I don’t see that happening a lot.


> I see the parent comment clearly stating small blogs lifting content from SO and SEO-optimising it.

Are we talking about the same comment?

> when a small site posts something and then that content gets stolen and posted verbatim as an answer on stackoverflow.

This is about SO answers stolen from small blogs, not SO answers stolen and copied to SEO optimized pages. Unless we’re not reading the sentence the same way somehow?


Sorry. I took the top-most comment as parent.

Having said that, SO answers getting stolen from small blogs is a rarity as I wrote in my previous reply.


It wouldn't have to be malicious or uncredited, SO answers often quote manuals, books, etc.


That's a good one too.


These can be hidden using uBlock origin. I’ve got a list you can import (just use the raw link). The selector isn’t perfect in Google (sometimes leaves some unattached content) but the link is always removed. Feel free to submit a PR if you find any other SO clones - there seems to be no end to them.

https://github.com/levymetal/filter-lists/blob/main/filters/...


Thanks! I didn’t realize uBlock can remove specific domains from Google search, but looking at filter definitions it makes sense.


buuh bye Investopedia


Is Investopedia a clone? What is the original?


So both sites lose? That'll make manipulation easier than ever before.

Or does the older site win? Frequently scraped / already popular sites will love that.


It kinda sounds like you think that Google doesn't have the resources to have a human decide that stackoverflow isn't itself a spam clone.

This is a huge problem for a very small number of sites and a non-existent problem for everything else. The list of authoritative, egregiously-copied sources could easily be maintained by one person. The only difficulty comes from determining when to trigger the system.


They have the resources, they don’t have the willingness. Google’s MO is never hiring a human to do a job well, when a computer can do the same job badly.


Derank the newer page, you don't need to derank the entire site.


This doesn’t work in general because people can copy content faster than google can check who was first. Also sometimes you actually want the derivative.


Let’s start with big offenders. If a website copied tens of thousands of comments from SO it gets penalized. We don’t always need to find a perfect system to improve our experience.


Using that rule it would be very easy to derank any website.


Not if the list of A sites was manually curated. Like stackoverflow and wikipedia. This isn't hard for shooting down a ton of sites.


Then people could derank sites by copying them to stack overflow.


And quickly they'll get deleted again: spam postings on SO are generally very short-lived.


Can't say I'm ever surprised to see another "WTF Google" post. The search has degraded a lot over the years.

With that said, I've tried to switch on more than one occasion and Bing might be the only alternative that worked for me. DuckDuckGo is always way off the mark, no matter how many chances I've given it over the years. Google still wins, even with the ads and spammy stuff.


This is entirely anecdotal but recently many top results in DDG read like they’re written by GPT-3 or the like. Half coherent sprawls of text that contain lots of random factoids but nothing novel or that actually answers the question.


No, this is just how most bloggers talk. /s


Some people here a few days ago recommended kagi, a paid service with a limited free option. I have been trying it and so far the results seem great!


Kagi has been amazing: verbatim mode that actually works, blocking sites you don't want to see, boosting sites you always want to prioritize, and lists of sites you can customize for specific searches.


Based off your recommendation and others, and looking at the site and it’s features, I am going to sign up for a month and give it a real try. I’ve been using google since the beginning and it feels time for a change.


I've switched. It's great at least for devs.


Brave Search has done it for me


I am surprised by how well it works. Makes me wonder if they do use Google or Bing indexes after all.


I've been using it, and it's not bad, but I do fall back on Google fairly often.


I thought DDG was mostly bing results at this point?


Kagi has been great for me


If you have page with a clean cut answer and a page full of repeated synonyms around the question prompt - the latter wins on Google.

If you have a site full of Google ads and a site without google ads, the former wins Google search traffic.

If you have a site with google analytics and one without the former wins.

If you have a search engine should you be allowed to prioritize entries that use your advertising network, analytics software, payment methods, etc?


Holds envelope to head and does Carnac impression

Name 3 things about SEO that aren’t true.


A huge issue is when one is googling for some obscure issue or error message at work, and end up in a fake website with NSFW content or banners.


God, those scrapers... You Google something & the results you get are c/p-ed from GitHub issues, with mangled formatting, and no link back to the original content.

We need domain-banners back in Google!


Anyone have any methods for avoiding this SEO nonsense?

I generally prefix search query with `site:reddit.com` or `site:news.ycombinator.com`

Which returns much better discussions about a question/topic, in those discussions can be genuinely great ideas and links


For dev searches, Kagi. I switched during closed beta, and I couldn't be happier. The SEO'd GitHub or SO copypasta is gone by default, and it gives you the ability to ban or prioritize other sites in your searches. I honestly haven't had to do that much because their filters are great by default, though I did give MDN a bump so it's always at the top for web stuff.


Yeah, and for product reviews I only trust SlickDeals commenters.

This funny tweet mostly covers it:

> u can google ur questions about the world and get the CIA FBI answer or u can add "reddit" to the end of it and learn the truth

https://twitter.com/literallysofie/status/137189962449199513...


be weary of astroturfing. Reddit has had this problem for over a decade and increasing popularity means more shills using the site to sway opinions. Aged accounts, karma boosting and fake comments/engagements are all common techniques to propagate a desired narrative


Unfortunately the game of cat and mouse between marketing teams, and potential consumers simply trying to research, continues.


Unfortunately I think the more this knowledge is shared the more those sites will be spammed.

Look what happened to ad blockers - we told everyone about them and now so many sites have ad blocker blockers.


or use https://hn.algolia.com/

the problem with Reddit is for some stupid reason that has yet to be fixed and likely never will, Google shows reallllllly old reddit results but marks them as new.


I looked the results for “remote work best practices” and they all seems a valid articles. Not really SEO spam.

Which article is considered as not spam? HRB one? Really?


I came here to say the same thing.

They're a bunch of lists of exactly what the search term was looking for. And I've found these kinds of lists to be great jumping-off points in terms of getting an overview, whether I'm looking for remote work practices or humidifiers or types of mindfulness.

If you want long, in-depth articles those usually come up when your search terms become more specific. For example, the first link includes a bullet point "One-on-ones and check-ins" for remote work... throw that into Google and it's a bunch of useful articles (not lists), the fourth one is from Harvard Business Review.

As far as I can tell, Google's doing a great job in terms of organic results, and then it's adding ads on top and bottom because obviously it's got to pay the bills. I don't get the complaints at all.


They earn on both. Ads and the ad cramped seo stuff. So they probably wont change anything. And this is the case for the last ten years.


The SEO spam is more prevalent in the programming space as recently published articles tend to shadow the older ones as long as they have the right set of keywords.

Just Google “react node.js” and you’d see a low-quality post farming the top spot (with paid-product CTA links).




you.com CEO here. We let people give feedback on which major content blocks they want to see and soon will let developers add their own search-apps also. We believe in an open platform you can control, for the future of search.


Great ending: "Hug me please." Love it.


I'm beginning to think that someone might be able to knock off Google. Remember Myspace? They were #1 once. Then revenue went down and they compensated by putting in more ads. Then they became irrelevant. Broadcast TV did the same thing. Facebook seems headed down that road.

Running a search engine isn't that expensive. Held down to 10% ad content and run with a modest staff, this could work out.


There is a simple way to show your frustration.

1. Buy Google stock

2. Google has to show more ads to keep market hype on stock price.

3. They will fill 2 pages instead of current one page with ads. And then 3 and 4...

4. You buy more stock and protect your down side.

5. They will insert 2-3 times more ads per video on youtube.

At some point the average person will loose Google. You win big on stocks until then.

/sarcasm - pls dont do this.


Not exactly that, but: https://www.gwei.org/index.php


Looks like only 200 million years to reach the goal.


DuckDuckGo


...has the same problem. There have been a few years where DDG/Bing's results were better than Google's for me, but nowadays DDG gives me the same blogspam results as Google used to do. I've mostly given up on general search engines, nowadays I use only site-specific searches. That means I won't find out about new motherboards other than those from MSI and Asus, and I won't know about python tooling unless they're listed on pypi, or maybe I will know it because it once featured on HN.

I've never used browser bookmarks before -- it used to be that I would remember enough about a site that I could always find it in the first few pages of search results. Nowadays, for many things I don't even bother to search; if I forgot to bookmark it, it might as well never have existed.

Frankly, it sucks. The endless quest for monetization has made it so that a large part of the Internet is now no longer discoverable for me. I eagerly await the return of curated directories, either in usenet or activitypub form.


And its not just that: its going to the next page... and see literally the same results.

It drives me crazy. The next page should be totally different results! I can go 10 pages in, and still see something from the first page!!


Might start using bookmarks again.


It works for me, so I switched and haven't looked back.


Just switch search engines.


To what? I'm unaware of any decent alternatives, unfortunately. I have all my defaults set to DuckDuckGo, but as soon as I search anything remotely uncommon, I end up having to switch to Google (although at least you can block the ad results with Ublock Origin browser extension).


I’ve had very good luck with a paid account at kagi.com. They have many filtering options to remove sites you don’t ever want to see (like Pinterest, allrecipes.com, etc.).


There are search engines that value your privacy. Qwant https://www.qwant.com/ Ecosia https://www.ecosia.org/ and Start Page https://www.startpage.com/

This page may also help... https://www.kylepiira.com/2020/02/07/which-search-engine-has...


valuing my privacy is brilliant, and I’ve tried all these engines, but on all occasions I’ve had to go back to google because they’re just not good enough:

poor results from anything longer than 2 keywords, no (consistent) quote search - which is a huge dealbreaker for me, no maps equivalent, no or little info about local businesses, and more.

google is successful because it’s very, very good, and if it was easy to beat it, it would have happened by now. yes there’s a lot of seo spam, but 99 times out of 100, Google knows what I’m looking for, which I can’t say for the others


How does their value of privacy fix the SEO and spam problem?


I still have to fall back to Google sometimes but DDG covers a good 80% of my search usage just fine. Any amount of switching is better than none at all, even if just to push back against Google's hegemony however we can.


Why would Google fix anything?

Google makes money from keeping you clicking. If it shows you want you want right away, then it loses money.

Keeping you on a treadmill chasing bits of cheese is how it became a billion-dollar company.


it became a billion dollar company by having a better index and search algorithm than anyone else

the carrot on a stick came later


I don’t know if everyone knows that SEO stands for search engine optimization and it is

> the process of maximizing the number of visitors to a particular website by ensuring that the site appears high on the list of results returned by a search engine.


Do you think there's anyone on HN who doesn't know "If you don't know what something is, just type its name into your search engine?"

this was on the first page of DuckDuckGo results for "SEO":

https://moz.com/learn/seo/what-is-seo


I was trying to save that search, but I accept your advice.


Whenever articles like these are published, I wonder who the Author thinks does the work of writing free articles on the Internet just for their pleasure. Content is expensive and is created by writers who must be paid.

To demand quality articles from the Internet, you must be willing to pay for them. Nobody exists just to write free passionate articles on the Internet and earn no money from them.


Plus, Mr. Dubakov knows damn well why people write those types of articles. In fact, he's published quite a few shallow search optimized articles on the same Medium blog he's using to complain about the practice and on the Fibery blog. He does it to publicize his startup, just like most of the others.

e.g.

10 Top Product Marketing Bullshit Things in 2022

Survival guide for an introverted CEO

10 worst interview questions

5 Biggest Problems in Software Development

Overcoming fear of public speaking in 8 easy & honest tips


Those are not shallow blogspam. Those are actually good articles derived from the man experience.


It was that way for more than decade. The internet used to be nothing but passion projects




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: