That is a great question. Well we first have to ask what would be the purpose of...

hackeruse · on Feb 21, 2024

This is a naive take. SEO schemes are attractive for companies that sell products themselves (e.g. try searching anything related to ETL tools). The content itself is the ad and you won’t find any ad serving scripts or affiliate links in there.

(Source: have created such schemes, although would generally not recommend them to my customers nowadays)

freediver · on Feb 21, 2024

Underestimate the average Kagi user at your own peril. I do not think many would fall prey to an LLM generated content marketing page and end up buying a product from such site. Much likelier scenario is the page gets instantly blocked/reported.

mentalpiracy · on Feb 21, 2024

not really underestimating Kagi users, just the inexorable push to colonize every last useful network.

This could also be an attack vector for adversaries looking to pollute Kagi's search results and/or force you to divert resources to policing it.

Aeolun · on Feb 21, 2024

That is, until eternal september.

Repulsion9513 · on Feb 21, 2024

They want to index companies that sell products. I don't see a big problem here if a company that sells a product I'm searching for, who happens to also have low-quality SEO content, shows up in that search.

In fact, I would rather they not get penalized for it, since low-quality SEO content is a good way to show up in certain other search engines (Google), and every business wants to show up in Google, making that content quite common even from reputable businesses making a quality product.

thrownaway163 · on Feb 21, 2024

As someone who in a past life spent loads of time doing of SEO I cannot help but find this argument flawed.

So, we shouldn’t penalize low quality, SEO, spam because of people’s wants? I do want them to penalize those sites because they are a disservice and more often than not crappy, unsecured WordPress that drowns out those that are not spam.

Thank you Kagi team! A shame how far Google’s results have fallen.

Edit: also SEO is one of the more seedier parts of the software industry. Tons of unaware small businesses conned into these awful, low quality sites. I literally quit because it was so morally bankrupt.

Repulsion9513 · on Feb 22, 2024

What is the goal of search? Isn't it to deliver the results that people want to see from their query? So... yeah.

> they are a disservice and more often than not crappy, unsecured WordPress that drowns out those that are not spam

What does that have to do with "reputable businesses making a quality product" as my comment said?

bootsmann · on Feb 21, 2024

You can block the site in kagi if you don't like it, that's 50% of the entire moat of the search engine.

WirelessGigabit · on Feb 21, 2024

Problem is that many webites used to hire writers which wrote tangentially related posts to get their main product higher ranked. Like LogRocket and Partition Minitool do.

Combine that with that guy who boasted about his 'SEO heist', I think it's a very valid concern.

BadHumans · on Feb 21, 2024

I have solved many problems because of a blog post created by company that wanted to get their product name out there and I don't think they should be looked at negatively for doing that. Are upset whenever a companies tech blog lands on HN? Because it is virtually the same thing. If you use Kagi and come across a site that you find is low quality and spammy then just block it. That's the cool thing about using Kagi.

pests · on Feb 21, 2024

Agreed.

I've also found that type of developer marketing valuable many times in the past. It's sometimes obvious its going to end in a pitch for the product, but often it does a good job summarizing the key problems in the space, mentioning or showing other solutions / offerings, and pitching which tradeoffs they made for their own product and how they solved issues.

Even if you don't go with the ad, you can quickly pivot to other named players or get a better understanding of the terminology or jargon to start searching more.

adiabatty · on Feb 21, 2024

My general impression of the LogRocket site is that they have decent articles on how to do frontend development. At least that what I remember from the times I've been directed there by a search engine.

And we…want to discourage writing useful web pages, even though articles on understanding TypeScript's type system aren't all that closely related to their main product…? What am I missing?

thereticent · on Feb 21, 2024

Isn't this exactly what 37signals and even joelonsoftware were? Isn't HN essentially a free conduit to YCombinator awareness?

I don't see the problem with what you're describing. It seems like one of the most contributory ways to market well.

nottorp · on Feb 21, 2024

You're right, but those are the good examples.

Another decent one would be linux sysadmin info from Digital Ocean and the likes.

But for every joelonsoftware there are 99999 sites that have all copy/pasted the same tutorial about something basic and try to push some random product or just ads.

daco · on Feb 21, 2024

Is there any reliable AI generated content detector today ? I've tried many free and paying ones online, but they're aren't reliable

vasco · on Feb 21, 2024

It's impossible to make. You cannot prove any sentence was created by an LLM and you can't prove it wasn't.

nottorp · on Feb 21, 2024

SEO pages pushing some product are SEO pages pushing some product. You should ignore them no matter what the source is, so what does it matter if they're LLM generated or hand written?

The problem is that people keep consuming the samey low quality content instead of skipping it (think superhero movies and Netflix series that are all indistinguishable from each other). As long as they're satisfied with that, they'll fall for fake product reviews too.

solardev · on Feb 21, 2024

Maybe you can't determine that with certainty, but there may be statistical tools you can use to estimate the probably that some content came from one of the LLMs we know about based on their known writing styles?

Someone did something like that to identify HN authors (as in correlating similar writing styles between pseudonyms) a few years back, for example: https://news.ycombinator.com/item?id=33755016

Or a study applying similar analysis to LLMs: https://arxiv.org/pdf/2308.07305

Of course, LLM output can be tweaked to evade these, just like humans can alter their writing style or handwriting to better evade detection. But it's one approach.

yunwal · on Feb 21, 2024

Unless you design the LLM yourself and purposefully watermark the output. https://arxiv.org/pdf/2306.04634.pdf

vasco · on Feb 21, 2024

That's a digital signature, same as sending an email with GPG to prove you sent it. You wouldn't say that because some people use GPG you can somehow detect who wrote every email on earth, it's a push model vs pull. This is why I wrote "any sentence" vs "some sentences".

dns_snek · on Feb 21, 2024

Watermarking is not at all like a digital signature and a lot like steganography. I only have a surface level understanding of the process, but it works by biasing token selection to encode information into the resulting text in a way that's resistant to later modifications and rephrasing.

I have my doubts about the effectiveness of this method and realistically, it won't make any difference because the bad actors will just use an LLM that doesn't snitch on them, so you're technically correct.

vasco · on Feb 21, 2024

The only way to make that stenography robust is to have the encoded message be generated with some secret key that can be verified. Otherwise anyone could manually fake the stenography into human typed messages assisted by some encoder and you'd have no way of telling if it was really typed by an LLM. That line of thinking is what makes it have to be like a signature to work like you said for "any sentence". I also think these methods only work above certain character limits. Short messages are impossible to tell.

tempusalaria · on Feb 21, 2024

If you look here : GitHub.com/HNx1/IdentityLM you can see that it’s relatively easy to sign LLM output with a private key using an adaptation of the watermarking method.

vasco · on Feb 21, 2024

This application is exactly what I was describing. I'll look it over to see how it scales the encryption strength based on token length or how it deals with short messages, which is the only thing I'd think it'd be very hard to do. If you print 2 paragraphs it's easy to change some tokens with a secret key mask but if you print "Yes", it's not so easy. Thanks for the great share.

FergusArgyll · on Feb 21, 2024

you can always ask it to include a '!' after every word and then sed it away. Poof, there goes your watermark