Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Inference is mostly just matrix multiplications, so there's plenty of competitors.

Problem is, inference costs do not dominate training costs. Models have a very limited lifespan, they are constantly retrained or obsoleted by new generations, so training is always going on.

Training is not just matrix multiplications, given hundreds of experiments in model architecture, its not even obvious what operations will dominate future training. So a more general purpose GPU is just a way safer bet.

Also, LLM talent is in extreme short supply, and you don't want to piss them off by telling them they have to spend their time debugging some crappy FPGA because you wanted to save some hardware bucks.



The more general the model, the longer the lifetime. And the most impactful models today are incredibly general. For things like Whisper, I wouldn't be surprised if we're already at 100:1 ratio for compute spent on inference vs training. BERT and related models are probably an order of magnitude or two above that. Training infra may be a bottleneck now, but it's unclear how long it will be until improvements slow and inference becomes even more dominant.

Capital outlays are tied to the derivative of compute capacity, so even if training just flatlines, hardware spend will drop significantly.


Isn't whisper self hosted


That's part of my point. There are 100s of organizations using it at scale, but it only needed to be trained once.


What would be the set of skills that would put you in the category of LLM talent that is in extreme short supply?

Just curious what the current bar is here and which of the LLM-related skills might be worth building.


Being able to train base LLMs. This is currently an alchemical skill since you can't learn it at school. This can be further split into infrastructure engineering (managing GPU clusters aint easy), data gathering and cleaning (at terabyte scale), the training itself, etc etc.

Being very good at fine tuning for a particular goal. Its much easier to learn fine-tuning, so standards are higher to stand out.

Being able to come up with architectural improvements for LLMs, aka the researcher path.

Wages start at $250k for grads at the big AI companies.


Funny you sort of describe me

1. For BERT scale model, all you need is a good codebase from GitHub (I had some luck with this one [0]) and a few weeks of trial and error. Want to try training T5 or LLaMA, but don't have the resources needed. Of course training models with more than 100B parameters is another level of labyrinth.

2. Finetuning is mostly related to how well you understand the task and the data you are dealing with. Since the BERT paper focuses on the GLUE benchmark, I've become very proficient in fine-tuning GLUE and eventually got sick of it.

3. Made some architectural improvements to BERT, got decent results so I wrote a paper, and got rejected because the reviewers want a head-on evaluation against some well funded papers from Google.

4. Not in my country. Damn, I am envious.

[0] https://github.com/IntelLabs/academic-budget-bert.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: