Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

X users are now officially just training data for Grok.


The comment you just wrote is training data for every major LLM out there.


But the difference is, Twitter users leave way too much into situational context for an LLM to comprehend, and so...


I'm aware, thanks.


As opposed to Reddit? Or your Gmail? Or everything else on the internet?


I take your point, however an AI company owning a social media platform is new.


Sam owns Reddit and OpenAI.. how is that new? Google also owns Gmail and all Google Services , as well as its AI.


OpenAI does not own a social network. Google doesn't even own a social network anymore.

Neither are valid comparisons to xAI owning X.


Google owns half of the emails in the world which may be more valuable.


Email is not a realtime social network, not even close.


Does Google train AI on emails?


And also Sam Altman doesn't own either OpenAI or Reddit, lol.


My understanding is that Google "owns" reddit in the sense that they paid to use it as source of training data. And goodle paid reddit so much that they have exclusive rights for that.

Probably this is the reason why all the reddit free public APIs are gone - to block scraping.


Can you cite your sources on Altman not owning Reddit?


https://en.wikipedia.org/wiki/Reddit

Owners: Advance Publications (30%), Tencent (11%), Sam Altman (9%)


"large-ish minority shareholder" != owner


Not really, Facebook (Meta, whatever) has been an ai company for a long time.


Their social network is not owned by a private AI company.


xAI is not a private company.

The AI company is public, but he social network was private.


"xAI is a privately held company and is not publicly traded, therefore investing in xAI pre-IPO is only available to accredited investors."

Source: https://forgeglobal.com/xai_ipo/


Oops, thanks for the correction.


With that logic Github, StackOverflow, rest of internet is also "only" training data.

X just produces extra valuable training data as a byproduct. Like power plants create certain byproducts that can be sold etc. Good to see it going to Grok primarily, as other LLM's are far from being truth seeking with their built-in, documented, extreme bias.


None of the companies you mentioned are owned by a private AI company, except X.

I can't think of any other example of an AI company owning it's own social network, it's a fresh precedent.


That is irrelevant to the invalidity of your original statement. LLM's clearly don't have problems having their training data scraped from all those mentioned irregardless of their ownership.


My original statement was from the perspective of the users, not the LLMs. Perfectly valid to empathize with them.


No it isn't ok to patronize X users with a false precedent. X and Grok work very well together, one can ask questions and get relevant, and RECENT posts by X users answering that query, something other LLM's can't really do.

Content created by X users is for X users to find either through their feed, basic search, or Grok. There's no foul play here, and how Grok uses data on X is not hard to defend even from a basic "better search" angle. Your "emphatize" comment sounds like "will someone think of the african children" kind of detached waste of breath, something the Chinese call "Baizuo".


It's not patronizing, it's a statement of fact: X is the only social network owned by an AI company (xAI), that only has one product (Grok) that is trained by data from X, which is user-generated data.

Now, you may not like that, but it's still real.


Aligning with the distribution you happened to be able to sample from is not 'truth seeking'.


I presume they officially were before. And just unofficially for every other model, as all our posts online are.


Indeed!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: