Google has been collecting user interactions since 2007 via GOOG-411, which was a precursor to the Google Assistant - I suspect Google has billions of user interactions on hand through the latter. Facebook has posts and comment, Amazon has products pages, reviews and product Q&As and all of them have billions of dollars to draw upon if they choose to buy high-quality data, or spin-up / increase teams that create and/or categorize training data.
They also have deep roster of AI researchers[1] to potentially obsolete LLMs or make fine-tuning work without access to of ChatGPT records.
1. I suspect Google alone has more AI researchers that OpenAI has employees
Google has been collecting user interactions since 2007 via GOOG-411, which was a precursor to the Google Assistant - I suspect Google has billions of user interactions on hand through the latter. Facebook has posts and comment, Amazon has products pages, reviews and product Q&As and all of them have billions of dollars to draw upon if they choose to buy high-quality data, or spin-up / increase teams that create and/or categorize training data.
They also have deep roster of AI researchers[1] to potentially obsolete LLMs or make fine-tuning work without access to of ChatGPT records.
1. I suspect Google alone has more AI researchers that OpenAI has employees