> And by the way, we are out of new training data to give the models.
Only easily accessible text data. We haven't really started using video at scale yet for example. It looks like data for specific tasks goes really far too ... for example agentic coding interactions aren't something that has generally been captured on the internet. But capturing interactions with coding agents, in combination with the base-training of existing programming knowledge already captured is resulting in significant performance increases. The amount of specicialed data we might need to gather or synthetically generate is perhaps orders of magnitude less that presumed with pure supervised learning systems. And for other applications like industrial automation or robotics we've barely started capturing all the sensor data that lives in those systems.
Only easily accessible text data. We haven't really started using video at scale yet for example. It looks like data for specific tasks goes really far too ... for example agentic coding interactions aren't something that has generally been captured on the internet. But capturing interactions with coding agents, in combination with the base-training of existing programming knowledge already captured is resulting in significant performance increases. The amount of specicialed data we might need to gather or synthetically generate is perhaps orders of magnitude less that presumed with pure supervised learning systems. And for other applications like industrial automation or robotics we've barely started capturing all the sensor data that lives in those systems.