OpenAI wouldn't be here without the work that Yann Lecun did at Facebook (back when it was facebook). Science is built on top of science, that's just how things work.
Yes, but in science you reference your work and credit those who came before you.
Edit: I am not defending OpenAI and we are all enjoying the irony here. But it puts into perspective some of the wilder claims circulating that DeekSeek was able to somehow complete with OpenAI for only $5M, as if on a level playing field.
OpenAI has been hiding their datasets, and certainly haven't credited me for the data they stole from my website and github repositories. If OpenAI doesn't think they should give attribution to the data they used, it seems weird to require that of others.
Edit: Responding to your edit, Deepseek only claimed that the final training run was $5m, not that the whole process caught that (they even call this out). I think it's important to acknowledge that, even if they did get some training data from OpenAI, this is a remarkable achievement.
It is a remarkable achievement. But if “some training data from OpenAI” turns out to essentially be a wholesale distillation of their entire model (along with Llama etc) I do think that somewhat dampens the spirit of it.
We don’t know that of course. OpenAI claim to have some evidence and I guess we’ll just have to wait and see how this plays out.
There’s also a substantial difference between training of the entire internet and one that very specifically targets your competitor's products (or any specific work directly).
That's $5M for the final training run. Which is an improvement to be sure, but it doesn't include the other training runs -- prototypes, failed runs and so forth.
It is OpenAI that discredits themselves when they say that each new model is the result of hundreds of USD millions in training. They throw this around as it is a big advantage of their models.
Is that really true? If anything OpenAI was dependent on the transformers paper from Google from Ashish Vaswani and others. LeCun has been criticizing LLM architectures for a long time and has been wrong about them for a long time.
Personally, I have not seen anything from him that is meaningful. OpenAI and Anthropic (itself started by former OpenAI people) of course have built their models without LeCun’s contributions. And for a few years now, LeCun has been giving the same talk anywhere he makes appearances, saying that large language models are a dead end and that other approaches like his JEPA architecture are the future. Meanwhile current LLM architecture has continued to evolve and become very useful. As for the misuse of the term “open source”, I think that really began once he was at Meta, and is a way to use his fame to market Llama and help Meta not look irrelevant.
By the way, as someone who once did classical image recognition using convolutions, I can't say I was very impressed by the CNN approach, especially since their implementation didn't even use FFTs for efficiency.