> I get the impression that's the same reason their fine-tuning services never took off either
Also, very few workloads that you'd want to use AI for are prime cases for fine-tuning. We had some cases where we used fine tuning because the work was repetitive enough that FT provided benefits in terms of speed and accuracy, but it was a very limited set of workloads.
Very typical e-commerce use cases processing scraped content: product categorization, review sentiment, etc. where the scope is very limited. We would process tens of thousands of these so faster inference with a cheaper model with FT was advantageous.
Disclaimer: this was in the 3.5 Turbo "era" so models like `nano` now might be cheap enough, good enough, fast enough to do this even without FT.