This 95% of models are statically computable thing really shows how much he is t...

sanxiyn · on May 25, 2023

I agree writing something that needs to be compatible with, say, PyTorch is a significant undertaking, but why is that necessary? I also agree some models like MaskRCNN is not static, and people will not change their model code, but I don't think it matters.

Let's say you want to run LLaMA. LLaMA is a tiny amount of code, say, 300 lines. LLaMA is static. It doesn't matter people will implement LLaMA with PyTorch and not tinygrad, geohot can port LLaMA to tinygrad himself. In fact, he already did, it's in tinygrad repository.

What I am saying is while running all models ever invented is harder than running LLaMA and Stable Diffusion (Stable Diffusion port is also in tinygrad repository), that's not necessarily trivializing the problem. It is noticing that you don't need to solve the full problem, there is enough demand for solving the trivial subset.

While developers will choose usability, users will choose cheap price. If they can run what they want on cheaper hardware, they will. I already have seen this happening: people don't buy NVIDIA to run Leela Chess Zero, they just run it on their hardware. It doesn't matter everyone working on LC0 model is using NVIDIA, that's irrelevant to users. LC0 model is fixed and tiny, people already ported the model to OpenCL, OpenCL port is performant, it runs well on AMD. The same will happen to text and image generation models.

mlazos · on May 26, 2023

Yeah for inference this is true, there could be a viable subset of models. You’re not going to build a viable business on inference though. It’s super cheap already and plenty of hardware can do it ootb with an existing framework as you’re saying. The $$ for selling chips is in training, and researchers trying new architectures are not going to wait for a port of their favorite model in a custom DSL or learn a new language to start prototyping now. You can port models forever, but that isn’t an ecosystem or a cuda compete. OpenCL + AMD != a from scratch company