Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Expanding Our Partnership with Google (redditinc.com)
77 points by TimCTRL on Feb 22, 2024 | hide | past | favorite | 65 comments


Too many words to just say "Google has bought Reddit data wholesale to train its LLMs".


Training on data from Reddit will certainly fix Gemini's bias against white men.


Gemini isn't bias because of the training data, it's bias because Google chooses to makes it bias by hijacking your prompt.


The past tense of "bias" is "biased."

"Gemini isn't biased because of the training data, it's biased because Google chooses to make it biased by hijacking your prompt."


Also, “Google has bought insurance against future subreddit blackouts”


I may be naive, but I'm not sure where the hate is coming from in this thread. It's no secret that Google search result quality has rapidly and consistently diminished over the past few years, a common topic of discussion here on HN. Many folks, myself included, have developed a habit of appending "reddit" in Google searches by default. And reddit's native search, on the other hand, is just abysmal. If this partnership results in finding quality and timely content in reddit faster, then I'm all for it. Net positive IMO.


You're naive if you think reddit is safe from SEO spam.

Right now it's only because very very few people append "reddit". The moment that changes, it will be spammed into oblivion with subtle LLM written SEO spam.

Companies are sending people free products in exchange for an Amazon review, you think a free Reddit post will stop them?


People on Reddit upvote and downvote comments all the time. You cannot upvote or downvote a Google search result. Unlike SEO spam surfaced by Google that only needs to satisfy Google's ranking algorithm, SEO spam on Reddit needs to do that and garner enough upvotes by a diversity of users. Even though some of the users might be sock puppets or bots, the value is still there as long as there are more humans than bots.


Suddenly Reddit user count is x5 and we have bots upvoting


Suddenly Reddit becomes a huge user of reCAPTCHA Enterprise and astoundingly, Google earns a bunch of money for improving its search results.


The Reddit sub blackouts really caught me off guard. Made me reflect on how crazy it is that a substantial portion of public knowledge and discourse over the past decade has occurred there. And this information is locked up under the control of Reddit. And we gave it to them(the information and the control).

Edit: It would be nice if there were a Wikipedia model equivalent to Reddit. A non-profit driven site for public discourse and discussion.


The site took a sharp turn downhill after that happened.

It seems like the moderators who enjoyed moderating bigger subreddits left the site and all that's left is power trippers who stuck around for the dust to settle so they could seize control and push their agenda.

Municipal/Provincial subreddits are nearly impossible to participate in because of how locked down the replacement mods have made them. The Toronto subreddit is run like a dystopian alternative reality where normal users can't post most news. Stories of crimes being committed is completely banned unless you're an approved poster or it's the cops committing crimes. The mods actively delete posts and then the same story gets re-uploaded by mods or their friends hours later and is allowed to stay up.


This must depend on specific communities? I use it heavily (as a reader; I rarely post) and have noticed no change at all.


Yeah.. Reddit shouldn't be relying on community moderators for larger subreddits.

They need paid people.


There's many Lemmy instances that are non-profit:

https://join-lemmy.org/instances

I don't know which ones are officially 501c3 in the US but nearly all of them aren't in it for any sort of profit motive. For reference, I use https://programming.dev/ (no affiliation other than I use it regularly and it seems alright).


This seems to have been posted the same time as Reddit released their S-1 on going public: https://www.redditinc.com/blog/reddit-files-registration-sta...


They also announced a program for specific US based mods/users to buy stock at the IPO.

https://old.reddit.com/r/reddit/comments/1axhbye/announcing_...


Wish reddit didn't kill forums. Also feels like reddit is going to become even more bot infested now because of this.


I think discord did more harm since it's harder to find information.


Agreed. Reddit ate other forums, but individual subreddits are close enough to forums in most important ways. Discord is far closer to a chat room, and is horrible wrt browsability, discovery, or threaded topics.


I think forums could make a comeback over the next 3-5 years. Big platforms like Reddit, Discord and Facebook have been testing their userbase's loyalty for a long time now. If you look at Mastodon, HN, and even Tumblr, you can see a viable number of people are fine with smaller communities, as long as they are genuine and quality. Forums -- especially specialist forums -- have always been great for that.


I feel like the Internet where forums thrived is very different from the present Internet. Forums won't make a comeback because they're much harder to make money or clout from, and the owners get to assume all the risk of hosting content, dealing with deplatforming by payment processors, and trying to find people to advertise.

Regardless of how you feel about them, Kiwifarms is what would happen to any forum that got too successful without the approval of a major player: a constant barrage of being removed from hosting providers, DDOS mitigation platforms and domain registrars.

Forums still exist because they're passion projects to very niche communities with very stubborn people. I treasure those communities but that's kind of incompatible with mainstream Internet and consumption habits.


I guess reddits anti scraping policy paid off


So too did their response to raising the API price.


Hopefully this reduces the useless quora results ranked higher in search results and saves me from typing "reddit" in the search text.


Reddit front page and top comments appears to be creepily hand curated by the DNC/State Dept. Country is completey divided on politics but somehow every single top political post and comment is pro-establishment.


Based on upvotes and user behaviour. Established = currently on top probably. Seems plausible


The overnight about face on reddit when the DNC stabbed Bernie Sanders in the back, coupled with the way Ellen Pao was yeeted off the glass cliff, the way IAMA was killed, the way conservative subreddits were purged, the moderator ousting during the blackout, etc.

Reddit's front page is about as organic as a twinkie.


Totally get it, shrewd business decision to monetize their content like this, but as a user this kind of stuff sucks. Used to be you could string together all sorts of cool little bots or automated workflows with reddit/twitter and whatnot, and now APIs are so closed off there isn't much room to experiment with that kind of thing. And it feels like just the beginning, as AI becomes more ubiquitous the roadblocks needed to protect genuine human engagement will become more onerous. Imagine the entire internet being Discord, with it's unsearchable silos, forever.


Not a problem for AI that controls the screen (mouse and keyboard), it can participate anywhere and extract information by mimicking humans. Multiple companies are working on it, including apparently OpenAI.


What did your bots and workflows do?


Between them Reddit and Macrumors have almost all the useful public information for macOS users, music producers, video creators, colourists. Handing Google the keys to Reddit will be trouble. I'm not quite sure what kind of trouble, but deep trouble.

The SEO spamslaught which will start now will be part of the issue, Google will also work to make visible on Reddit what they like and make invisible what they don't like. Money talks and Reddit was already big on censorship. A more independent public space is what we need, not a more moderated and beholden prison yard.


It's concerning to see that the site that was once the absolute paragon of Internet freedom going down this path. Reddit has gone from the shining start of the Internet to a hollow, slowly rotting husk.


This title increasingly parses as "dystopian".


You may just have depression.


Um.


Just say no to your voice, your content, your time, and your efforts being hoovered up into some company's commercial LLMs and then regurgitated back out as "novel" information.

I left Reddit the moment they screwed over third-party developers and never looked back. Now I'm even happier to avoid fueling their data aspirations.


Why would Google pay for Reddit data when their bot is one of the only that has to remain whitelisted anyway?


Because it is chicken feed to them but sets a floor that represents a significant barrier for other players that might seek to get the same type of deal?


Probably to lock out future competition.

Note that Webtext and OpenWebText datasets were basically scraped from Reddit. It's incredibly valuable if you can lock down this dataset vs. letting competitors use it.


Because the bot can't see deleted posts, can't see associated metadata, and whatever else reddit hides from view.


Exactly! Reddit can fuck off.


My thoughts are this has to do with Google shutting down 3rd party cookies. A deal might have been made between the two juggernauts after both putting up their own respective "walls".


This feels like Sauron and Saruman are shaking hands.

I need to de-google my online life.


So Google will just prioritize site:reddit.com content now? Doesn't sound quite "open" or "authentic" as Google mentions on their blog post.


> With this partnership, and via our Data API, we’re ushering in new ways for Reddit content to be displayed across Google products by providing programmatic access to new, constantly evolving, and dynamic public posts, comments, etc., on Reddit. This enhanced collaboration provides Google with an efficient and structured way to access the vast corpus of existing content on Reddit and enables Google to use the Reddit Data API to improve its products and services – including supporting new ways to display Reddit content and providing more efficient ways to train models.

That's what I thought too. Guess SEO train will follow. site:reddit.com was useful while it lasted.


So basically what you have to do is create a social service , make sure everything is open with an easy to use dev API to get the enthusiasts and third party clients going.

And then once you’re big enough you close all the doors and then try to get big players to pay you for access?


You're ready to lead.


Yes, this is called enshittification. It has absolutely been the game plan for the past 15 years or so.


Is this a precursor to a Google acquisition? Can I get Google Assistant to tell me what's the latest post in a NSFW subreddit?


There have been a bunch of reports that a Reddit IPO is imminent.


Are these stats still true? Why would anyone buy Reddit if this is the case?

https://www.cnbc.com/2019/02/11/reddit-users-are-the-least-v...

Twitter ARPU: ~$9.48

Facebook: $7.37

Pinterest: ~$2.80

Snap: $2.09

Reddit: ~$0.30


That's from 2019, curious what the stats are in 2024. I've seen a lot more big names advertising on reddit lately.


Most reddit investment in the last several years has been about buying mindshare. Whoever controls the reddit board has a huge sway on public opinion, at least in the US.


Only 8 months ago, Reddit's CEO said the company wasn't profitable. An IPO would seem like a foolish decision.

https://www.reddit.com/r/reddit/comments/145bram/addressing_...


If you have/had a reddit account, can you ask your data not be used?


so, this is the AI company they got $60M from, lol.


Is this cheap or expensive? $60M/year - about $1/user/year. As a reference, Apple paid $50 million for access to Condé Nast + NBC News + IAC.

I only worry about erecting barriers to open source AI by setting such prices to access culture data. And it's our own data this time.


Imagine being in Google's AI space and thinking to yourself

"You know what we need more of? Unhinged, depraved, often inane arguments from the internet"

"Yes that will work better"


Not true, there's a lot of quality content on Reddit.

What is more alarming is that the garden walls are getting higher, and only the Big Players are allowed to make direct bridges with it.


You must be referring to the default and large subreddits. I find Reddit to be quite pleasant (despite overused jokes etc) once excluding any sub above like 20k.


AskHistorians is like 2M, and remains the best that site has to offer.


People often say the same about Twitter/X, 4chan, Facebook, and YouTube, and yet they all have great content. And often, only the content you choose to see.


Most of my Google searches take the form of '<search query> reddit' these days.


Gross. Just ewww.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: