Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This 95% of models are statically computable thing really shows how much he is trivializing this problem. I’d be interested to see his SW stack compile MaskRCNN. His ISA is massively under-defined and people will not change their model code to run on this accelerator unless his performance beats cuda significantly and even then they still won’t - usability matters more than performance every time. In the end you need a compiler, and it needs to be compatible with an existing framework which is not trivial at all, since they are written in python.


I agree writing something that needs to be compatible with, say, PyTorch is a significant undertaking, but why is that necessary? I also agree some models like MaskRCNN is not static, and people will not change their model code, but I don't think it matters.

Let's say you want to run LLaMA. LLaMA is a tiny amount of code, say, 300 lines. LLaMA is static. It doesn't matter people will implement LLaMA with PyTorch and not tinygrad, geohot can port LLaMA to tinygrad himself. In fact, he already did, it's in tinygrad repository.

What I am saying is while running all models ever invented is harder than running LLaMA and Stable Diffusion (Stable Diffusion port is also in tinygrad repository), that's not necessarily trivializing the problem. It is noticing that you don't need to solve the full problem, there is enough demand for solving the trivial subset.

While developers will choose usability, users will choose cheap price. If they can run what they want on cheaper hardware, they will. I already have seen this happening: people don't buy NVIDIA to run Leela Chess Zero, they just run it on their hardware. It doesn't matter everyone working on LC0 model is using NVIDIA, that's irrelevant to users. LC0 model is fixed and tiny, people already ported the model to OpenCL, OpenCL port is performant, it runs well on AMD. The same will happen to text and image generation models.


Yeah for inference this is true, there could be a viable subset of models. You’re not going to build a viable business on inference though. It’s super cheap already and plenty of hardware can do it ootb with an existing framework as you’re saying. The $$ for selling chips is in training, and researchers trying new architectures are not going to wait for a port of their favorite model in a custom DSL or learn a new language to start prototyping now. You can port models forever, but that isn’t an ecosystem or a cuda compete. OpenCL + AMD != a from scratch company




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: