Inference is mostly just matrix multiplications, so there's plenty of competitor...

conjecTech · on June 6, 2023

The more general the model, the longer the lifetime. And the most impactful models today are incredibly general. For things like Whisper, I wouldn't be surprised if we're already at 100:1 ratio for compute spent on inference vs training. BERT and related models are probably an order of magnitude or two above that. Training infra may be a bottleneck now, but it's unclear how long it will be until improvements slow and inference becomes even more dominant.

Capital outlays are tied to the derivative of compute capacity, so even if training just flatlines, hardware spend will drop significantly.

flangola7 · on June 6, 2023

Isn't whisper self hosted

conjecTech · on June 6, 2023

That's part of my point. There are 100s of organizations using it at scale, but it only needed to be trained once.

samvher · on June 6, 2023

What would be the set of skills that would put you in the category of LLM talent that is in extreme short supply?

Just curious what the current bar is here and which of the LLM-related skills might be worth building.

anonylizard · on June 6, 2023

Being able to train base LLMs. This is currently an alchemical skill since you can't learn it at school. This can be further split into infrastructure engineering (managing GPU clusters aint easy), data gathering and cleaning (at terabyte scale), the training itself, etc etc.

Being very good at fine tuning for a particular goal. Its much easier to learn fine-tuning, so standards are higher to stand out.

Being able to come up with architectural improvements for LLMs, aka the researcher path.

Wages start at $250k for grads at the big AI companies.

bwv848 · on June 6, 2023

Funny you sort of describe me

1. For BERT scale model, all you need is a good codebase from GitHub (I had some luck with this one [0]) and a few weeks of trial and error. Want to try training T5 or LLaMA, but don't have the resources needed. Of course training models with more than 100B parameters is another level of labyrinth.

2. Finetuning is mostly related to how well you understand the task and the data you are dealing with. Since the BERT paper focuses on the GLUE benchmark, I've become very proficient in fine-tuning GLUE and eventually got sick of it.

3. Made some architectural improvements to BERT, got decent results so I wrote a paper, and got rejected because the reviewers want a head-on evaluation against some well funded papers from Google.

4. Not in my country. Damn, I am envious.

[0] https://github.com/IntelLabs/academic-budget-bert.