This is what I'd consider doing if I was a small AI lab. Don't try to build a frontier LLM that beats all benchmarks. Try to make the world's best LLM at one programming language. Create your RL pipeline that puts all your resources into making the LLM the best at that language. Even better if there's a dearth of human-created training data on Github, since all your competitors will be bad at it.
Google somewhat did this with javascript in their latest Gemini-2.5 Pro release. But what about doing it for a smaller language? Google isn't going to do that, but there is still a lot of demand.
I'm not saying this is a bad idea, but it does sound like a rather risky prospect. You're basically proposing a bet against the ability of LLMs to generalize across programming languages, and to embed concepts at a deeper level than the syntax.
Many people do think this, but I'm not sure many of them are running AI labs.
From my experience around less-used languages (with clojure on one hand and code aster's python on the other), LLMs may be able to generalize syntax but availability of APIs, functions, etc. is something that you can't solve by generalizing. Or more precisely, you can generalize but that means hallucinating non existing tools.
Would non-generalizing solve this issue for libraries though? Ie a lot of models produce reasonable code for me, but i almost always care about usage of libraries. That's where they get the wrong version, or hallucinate, or etc for me.
General purpose LLM's fail really hard at this in domains like terraform. There may be drastic differences in syntax and semantics between the massive matrix of terraform version + provider version(s) and they've shown to be absolutely terrible at navigating that, even if you specify versions specifically. Even worse, and probably what exacerbates it, this version matrix changes at a much faster pace than most programming languages typically introduce large changes.
> There may be drastic differences in syntax and semantics between the massive matrix of terraform version + provider version(s) and they've shown to be absolutely terrible at navigating that, even if you specify versions specifically.
To be fair, humans have trouble with that as well.
That is always the counter but that is not the promise of these tools to be as bad as humans are. Humans can also work their way through it. The context ruts these tools get into with terraform specifically they can never dig out of, it's pretty much worthless for it, at least as many times as I've tried. You will waste far more time trying to figure out where it messed up on something dead simple than if you just looked over a colleague's shoulder, who probably wouldn't run into the same kinds of basic mistakes like making up fields.
Meta synthetically generated lots of PHP from Python for Llama 3 for training purposes. Meta writes a crazy amount of PHP internally.
Translation tends to be way easier than unconstrained generation for LLMs. But if you can translate and filter a large amount of code, you can learn to generate. If you also translate and run the unittests, you get another layer of error checking.
it feels to me most of the real usage of AI is in coding right now, so a small lab that decided to go all in into just code-gen would have at least the differentiator of a narrower field to beat the bigger incumbents doing it all?
I dunno tho.
Big AI labs also have their own agendas and would rather keep scaling and growing than serving a rather smaller real market ?
Once you're into real usage territory, you can't no longer use make up numbers to justify future growth.
Again though, my point was just that it's not actually clear that you can do better than these big models by taking a narrower focus. I'm saying that that the things these big LLMs are learning about other languages probably do have utility when applied even to quite niche languages.
If you take some niche language and build an LLM from scratch that's hyperspecialized on that language, will that LLM actually outperform some big LLM that's trained on all the programming resources out there, and all the blogs, forum conversations, stack overflow posts on all those languages, and then learns to generalize that information and apply it to your niche language?
One of the things that LLMs seem to excel at is taking information from one context, transforming it and applying it to another context.
So how I envision this would be like a dual system, you let the frontier bigger LLM come up with the overall function signature, structure, and reasoning/planning around the specific code, but then have it ask the hyperspecialized fine-tuned model which can only output valid code, to create it.
You get then best of both worlds at the expense of a double-round trip or x2, which for something like coding seems fine, people are OK paying 200 for ChatGPT Plus
This also would solve the context window sizes problem of them getting full and the model starting to generate non-sense, if you have the bigger model using the bigger context window to orchestrate and organize the task calling smaller specialized sub-modules, that seems like it should yield better final code outptus than just one big ass LLM
but we'r'e moving the goalposts from 1 model to multi-agentic system i guess so nvermind
and i agree it seems all the big corps are betting for bigger more data for now
It makes sense to specialize it on one programming language to dedicate all of the LLM's intellectual space to that one domain, but on the flip side I wonder how much the LLM's sharpness and reasoning capabilities is increased by having more data to train on even if it's the wrong programming language.
As a developer I certainly think my programming skills in a specific language was improved by knowing other languages so I can contrast and compare.
You could just have specialized fine-tunes for esxh programling la guage that are only called when writing code, a more general bigger model could pass the plan/pseudo code to it
Using the language itself isn't the challenge for LLMs, they do that with a very high success rate. I haven't seen an LLM make syntax errors for several months. Calling the right functions with correct parameters is the challenge your hypothetical AI lab will have to solve (or half ass it and show great benchmark results).
Google somewhat did this with javascript in their latest Gemini-2.5 Pro release. But what about doing it for a smaller language? Google isn't going to do that, but there is still a lot of demand.