I code with multiple LLMs every day and build products that use LLM tech under the hood.
I dont think we're anywhere near LLMs being good at code design.
Existing models make _tons_ of basic mistakes and require supervision even for relatively simple coding tasks in popular languages, and its worse for languages and frameworks that are less represented in public sources of training data.
I am _frequently_ having to tell Claude/ChatGPT to clean up basic architectural and design defects.
Theres no way I would trust this unsupervised.
Can you point to _any_ evidence to support that human software development abilities will be eclipsed by LLMs other than trying to predict which part of the S-curve we're on?
I can't point to any evidence. Also I can't think of what direct evidence I could present that would be convincing, short of an actual demonstration? I would like to try to justify my intuition though:
Seems like the key question is: should we expect AI programming performance to scale well as more compute and specialised training is thrown at it? I don't see why not, it seems an almost ideal problem domain?
* Short and direct feedback loops
* Relatively easy to "ground" the LLM by running code
* Self-play / RL should be possible (it seems likely that you could also optimise for aesthetics of solutions based on common human preferences)
* Obvious economic value (based on the multi-billion dollar valuations of vscode forks)
All these things point to programming being "solved" much sooner than say, chemistry.
LLMs will still hit a ceiling without human-like reasoning. Even two weeks ago, Claude 3.7 made basic mistakes like trying to convince me the <= and >= operators on Python sets have the same semantics [1]. Any human would quickly reject something like that (why would be two different operators evaluate to the same value), unless there is overwhelming evidence. Mistakes like this show up all the time, which makes me believe LLMs are still very good at matching/reproducing code it has seen. Besides that I've found that LLMs are really bad at novel problems that were not seen in the training data.
Also, the reward functions that you mention don't necessarily lead to great code, only running code. The should be possible in the third bullet point does very heavy lifting.
At any rate, I can be convinced that LLMs will lead to substantially-reduced teams. There is a lot of junior-level code that I can let an LLM write and for non-junior level code, you can write/refactor things much faster than by hand, but you need a domain/API/design expert to supervise the LLM. I think in the end it makes programming much more interesting, because you can focus on the interesting problems, and less on the boilerplate, searching API docs, etc.
I asked ChatGPT, Claude, Gemini and DeepSeek what the AE and OE mean in "Harman AE OE 2018 curve". All of them made up complete bullshit, even for the OE (Over Ear) term. AE is Around Ear. The OE term is absurdly easy to find even with the most basic of search skills, and is in fact the fourth result on Google.
The problem with LLMs isn't that they can't do great stuff: it's that you can't trust them to do it consistently. Which means you have to verify what they do, which means you need domain knowledge.
Until the next big evolution in LLMs or a revolution from something else, we'll be alright.
I know what you're saying, I guess it depends on the use case and it depends on the context. Pretty much like asking someone off the street something random. Ask someone about an apple some may say a computer and others a fruit.
This is my view. We've seen this before in other problems where there's an on-hand automatic verifier. The nature of the problem mirrors previously solved problems.
The LLM skeptics need to point out what differs with code compared to Chess, DoTA, etc from a RL perspective. I don't believe they can. Until they can, I'm going to assume that LLMs will soon be better than any living human at writing good code.
> The LLM skeptics need to point out what differs with code compared to Chess, DoTA, etc from a RL perspective.
An obviously correct automatable objective function? Programming can be generally described as converting a human-defined specification (often very, very rough and loose) into a bunch of precise text files.
Sure, you can use proxies like compilation success / failure and unit tests for RL. But key gaps remain. I'm unaware of any objective function that can grade "do these tests match the intent behind this user request".
Contrast with the automatically verifiable "is a player in checkmate on this board?"
I'll hand it to you that only part of the problem is easily represented in automatic verification. It's not easy to design a good reward model for softer things like architectural choices, asking for feedback before starting a project, etc. The LLM will be trained to make the tests pass, and make the code take some inputs and produce desired outputs, and it will do that better than any human, but that is going to be slightly misaligned with what we actually want.
So, it doesn't map cleanly onto previously solved problems, even though there's a decent amount of overlap. But I'd like to add a question to this discussion:
- Can we design clever reward models that punish bad architectural choices, executing on unclear intent, etc? I'm sure there's scope beyond the naive "make code that maps input -> output", even if it requires heuristics or the like.
This is in fact not how a chess engine works. It has an evaluation function that assigns a numerical value (score) based on a number of factors (material advantage, king "safety", pawn structure etc).
These heuristics are certainly "good enough" that Stockfish is able to beat the strongest humans, but it's rarely possible for a chess engine to determine if a position results in mate.
I guess the question is whether we can write a good enough objective function that would encapsulate all the relevant attributes of "good code".
An automated objective function is indeed core to how alphago, alphazero, and other RL + deep learning approaches work. Though it is obviously much more complex, and integrated into a larger system.
The core of these approaches are "self-play" which is where the "superhuman" qualities arise. The system plays billions of games against itself, and uses the data from those games to further refine itself. It seems that an automated "referee" (objective function) is an inescapable requirement for unsupervised self-play.
I would suggest that Stockfish and other older chess engines are not a good analogy for this discussion. Worth noting though that even Stockfish no longer uses a hand written objective function on extracted features like you describe. It instead uses a highly optimized neutral network trained on millions of positions from human games.
Maybe I am misunderstanding what you are saying, but eg stockfish, given time and threads, seems very good at finding forced checkmates within 20 or more moves.
> The LLM skeptics need to point out what differs with code compared to Chess, DoTA, etc from a RL perspective.
I see the burden of proof has been reversed. That’s stage 2 already of the hubris cycle.
On a serious note, these are nothing alike. Games have a clear reward function. Software architecture is extremely difficult to even agree on basic principles. We regularly invalidate previous ”best advice”, and we have many conflicting goals. Tradeoffs are a thing.
Secondly programming has negative requirements that aren’t verifiable. Security is the perfect example. You don’t make a crypto library with unit tests.
Third, you have the spec problem. What is the correct logic in edge cases? That can be verified but needs to be decided. Also a massive space of subtle decisions.
Isn't this just a pot calling the kettle black? I'm not sure why either side has the rightful position of "my opinion is right until you prove otherwise".
We're talking about predictions for the future, anyone claiming to be "right" is lacking humility. The only think going on is people justifying their opinions, no one can offer "proof".
But yes, and no. I’d agree in the sense that the null hypothesis is crucial, possible the main divider between optimists and pessimists. But I’ll still hold firm that the baseline should be predicting that transformer based AI differs from humans in ability since everything from neural architecture, training, and inference works differently. But most importantly, existing AI vary dramatically in ability across domains, where AI exceeds human ability in some and fail miserably in others.
Another way to interpret the advancement of AI is viewing it as a mirror directed at our neurophysiology. Clearly, lots of things we thought were different, like pattern matching in audio- or visual spaces, are more similar than we thought. Other things, like novel discoveries and reasoning, appear to require different processes altogether (or otherwise, we’d see similar strength in those, given that training data is full of them).
I think the difference it that computers tend to be pretty good at thing we can do autonomically- ride a bike, drive a car in non-novel/dangerous sitations and things that are advanced versions of unreasoned speech - regurgitations/reformulations of things it can gather from a large corpus and cast into it’s neural net.
They fail at things requiring novel reasoning not already extant in its corpus, a sense of self, or an actual ability to continuously learn from experience, though those things can be programmed in manually as secondary, shallow characteristics.
> I code with multiple LLMs every day and build products that use LLM tech under the hood. I dont think we're anywhere near LLMs being good at code design.
I too use multiple LLMs every day to help with my development work. And I agree with this statement. But, I also recognize that just when we think that LLMs are hitting a ceiling, they turn around and surprise us. A lot of progress is being made on the LLMs, but also on tools like code editors. A very large number of very smart people are focused on this front and a lot of resources are being directed here.
If the question is:
Will the LLMs get good at code design in 5 years?
I think the answer is:
Very likely.
I think we will still need software devs, but not as many as we do today.
> I think we will still need software devs, but not as many as we do today.
There is already another reply referencing Jevons Paradox, so I won't belabor that point. Instead, let me give an analogy. Imagine programmers today are like scribes and monks of 1000 years ago, and are considering the impact of the printing press. Only 5% of the population knew how to read & write, so the scribes and monks felt like they were going to be replaced. What happened is the "job" of writing language will mostly go away, but every job will require writing as a core skill. I believe the same will happen with programming. A thousand years from now, people will have a hard time imagining jobs that don't involve instructing computers in some form (just like today it's hard for us to imagine jobs that don't involve reading/writing).
> I think we will still need software devs, but not as many as we do today.
I'm more of an optimist in that regard. Yes, if you're looking at a very specific feature set/product that needs to be maintained/develop, you'll need less devs for that.
But we're going to see the Jevons Paradox with AI generated code, just as we've seen that in the field of web development where few people are writing raw HTML anymore.
It's going to be fun when nontechnical people who'd maybe know a bit of excel start vibe coding a large amount of software, some of which will succeed and require maintenance. This maintenance might not involve a lot of direct coding either, but a good understanding of how software actually works.
Nah man, I work with them daily. For me, the ceiling was reached a while ago. At least for my use case, these new models don’t bring any real improvements.
I’m not even talking about large codebases. It struggles to generate a valid ~400 LOC TypeScript file when that requires above-average type system knowledge. Try asking it to write a new-style decorator (added in 2023), and it mostly just hallucinates or falls back to the old syntax.
You're using them in reverse.
They are perfect for generating code according to your architectural and code design templete. Relying on them for architectural design is like picking your nose with a pair of scissors - yeah technically doable, but one slip and it all goes to hell.
Well, I have asked LLM to fix some piece of Python Django code so it uses pagination for the list of entities. And LLM came up with the working solution, impressively complicated piece of Django ORM code, which was totally needles, as Django ORM has Paginator class that does all the job without manual fetching pages, etc.
LLM sees pagination, it does pagination. After all LLM is an algorithm that calculates probability of the next word in a sequence of words, nothing less and nothing more. LLM does not think or feel, even though people believe in this saying thank you and using polite words like "please". LLM generates text on the base of what it was presented. That's why it will happily invent research that does not exist, create a review of a product that does not exist, invent a method that does not exist in a given programming language. And so on.
Im using them fine.
Im refuting the grandparent's point that they will replace basically all programming activities (including architecture) in 5 years.
The software tool takes a higher-level input to produce the executable.
I'm waiting for LLMs to integrate directly into programming languages.
The discussions sound a bit like the early days of when compilers started coming out, and people had been using direct assembler before. And then decades after, when people complained about compiler bugs and poor optimizers.
Exactly, I also see code generation to current languages as output only an intermediary step, like we had to have those -S switches, or equivalent, to convince developers during the first decades of compiler existence, until optmizing compilers took over.
"Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning"
Not OP, but probably similar to how tool calling is managed: You write the docstring for the function you want, maybe include some specific constraints, and then that gets compiled down to byte code rather than human authored code.
Im saying, lets see some actual reasoning behind the extrapolation rather than "just trust me bro" or "sama said this in a TED talk".
Many of the comments here and elsewhere have been in the latter categories.
I run a software development company with dozens of staff across multiple countries. Gemini has us to the point where we can actually stop hiring for certain roles and staff have been informed they must make use of these tools or they are surplus to requirements. At the current rate of improvement I believe we will be operating on far less staff in 2 years time.
Thanks -- this is what I mean by evidence, someone with actual experience and skin in the game weighing in rather than blustering proclamations based on vibes.
I agree they improve productivity to where you need fewer developers for a similar quantity of output than before.
But I dont think LLMs specifically will reduce the need for some engineer to do the higher level technical design and architecture work, just given what Ive seen and my understanding of the underlying tech.
I believe that at current rate your entire company will become irrelevant in 4 years. Your customers will simply use Gemini to build their own software.
Wrong. Because we dont just write software. We make solutions. In 4 years we will still be making solutions for companies. The difference will be that the software we design for that solution will likely be created by AI tools, and we get to lower our staff costs, whilst increasing our output and revenue.
If they are created by AI tools which we all have access to that means everyone will now become your competitor, and with all the people you are planning on letting go they can just as easily as you use these AI tools to create solutions for companies. So in a way you will have more competition, and calculation that you will have more revenue might not be that easy.
It means what it says. We dont just write software. An LLM cannot do the service that the company provides because it isnt just software and digital services.
Not at all. We dont care whether the software is written by a machine or by a human. If the machine does it cheaper, to a better, more consistent standard, then its win for us.
That might be the case if we were an organisation that resisted change and were not actively pursuing reducing our staff count via AI, but it isnt. In the AI era our company will thrive because we are no longer constrained by needing to find a specific type of human talent that can build the complicated systems we develop.
So what will happen once most/all your staff is replaced with AI? Your clients will ask the fundamental question: what are we paying you for? You are missing the point that the parent comment raises: LLMs are not only replacing the need for your employees, they are replacing the need for you.
We don't produce software for clients. We provide solutions. That is what they pay us for. Until there is AGI (which could be 4 years away or 400) there is no LLM which can do that.
We have a very successful company that has been running 30 years, with developers across 6 countries. We just make sure we hire developers who know that theyre here to do a job, on our terms, for which they will get paid, and its our way or the highway. If they dont like it, they dont have to stay. However, through doing this we have maintained a standard that our competitors fail at, partly because they spend their time tiptoeing around staff and their comforts and preferences.
I dont hunt 'AI skeptics'. I just provide a viewpoint based on professional experience. Not one that is 'AI is bad at coding because everyone on Twitter says so"
and you happened to have created an account in hackernews just 3 months ago after 30 years in business just to provide a viewpoint based on professional experience?
Yes, you're right I should have made an account 30 years ago, before this website existed, and gotten involved in all the discussions taking place about the use of ChatGPT and LLMs in the software development workplace
Have you ever hired anyone for their expertise, so they tell you how to do things, and not the other way around? Or do you only hire people who aren't experts?
I don't doubt you have a functioning business, but I also wouldn't be surprised if you get overtaken some day.
Most of our engineers are hired because of their experience. They don't really tell us how to do things. We already know how to do it. We just want people who can do it. LLMs will hopefully remove this bottleneck.
Wow, you are really projecting the image a wonderful person to work for.
I don't doubt you are successful, but the mentality and value hierarchy you seem to express here is something I never want to have anything to do with.
I replied to the follow-up comment about following the guidelines in order to avoid hellish flamewars, but you played a role here too with a snarky, sarcastic comment. Please be more careful in future and be sure to keep comments kind and thoughtful.
This subthread turned into a flamewar and you helped to set it off here. We need commenters to read and follow the guidelines in order to avoid this. These guidelines are especially relevant:
Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.
Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.
Please don't fulminate. Please don't sneer, including at the rest of the community.
What if I told you that a dev group with a sensibly-limited social-club flavor is where I arguably did my best and also had my happiest memories from? In the midst of SOME of the "socializing" (which, by the way, almost always STILL sticks to technical topics, even if they are merely adjacent to the task at hand) are brilliant ideas often born which sometimes end up contributing directly to bottom lines. Would you like evidence of social work cohesion leading to more productivity and happier employees? Because I can produce that. (I'd argue that remote work has negatively impacted this.)
But yes, I also once worked at a company (Factset) where the CTO had to put a stop to something that got out of hand- A very popular game at the time basically took over the mindshare of most of the devs for a time, and he caught them whiteboarding game strategies during work hours. (It was Starcraft 1 or 2, I forget. But both date me at this point.) So he put out a stern memo. Which did halt it. And yeah, he was right to do that.
Just do me this favor- If a dev comes to you with a wild idea that you think is too risky to spend a normal workday on, tell them they can use their weekend time to try it out. And if it ends up working, give them the equivalent days off (and maybe an extra, because it sucks to burn a weekend on work stuff, even if you care about the product or service). That way, the bet is hedged on both sides. And then maybe clap them on the back. And consider a little raise next review round. (If it doesn't work out, no extra days off, no harm no foul.)
I think your attitude is in line with your position (and likely your success). I get it. Slightly more warmth wouldn't hurt, though.
> What if I told you that a dev group with a sensibly-limited social-club flavor is where I arguably did my best and also had my happiest memories from?
Maybe you did, and as a developer I am sure it is more fun, easier, and enjoyable to work in those places. That isnt what we offer though. We offer something very simple. The opportunity for a developer to come in, work hard, probably not enjoy themselves, produce what we ask, to the standard we ask, and in return they get paid.
It's our company, we own it. We are not 'some executives'. If someone develops an AI that can replace what we do and perform at the same level or higher, then I would gladly welcome it.
The reason you feel safe now is because of the marketing tactics of AI companies in pushing their phished goods on the world. LLMs have done anything yet other then reduced the barrier of entry into the software field. Like what google search and stackoverflow did 10yrs ago. The same principles apply, if your only skill is using an LLMs (or google searching) then you will be the first replaced when the markets turn. The ability to reason about options of a company in making money over the short term, vs long term, should be fairly easy to reason about based on the availibility of news. AI companies already know this. The stratergy has been played out. They make more money this way. They get to suck up all the info from your corperation, because they will get that data. Once they build these models, they will replace you too. Sure your saving time and money today, but thats just the cost of building the model for them.
I am pretty sure ArthurStacks account is either a troll or an LLM gone rogue troll. There are so many contradictions among his own comments that it is embarrassing to list them all. But given the reaction and number of replies he gets, the trolling is rather successful.
Looks a bit like your comment was being downvoted, which is also interesting to see. If Arthur Stacks is a bot, then it potentially follows that there is vote-manipulation going on as well, to quell dissenting opinions.
IMO this is completely "based". Delivering customer values and making money off of it is own thing, and software companies collectively being a social club and an place for R&D is another - technically a complete tangent to it. It doesn't always matter how sausages came to be on the served plate. It might be the Costco special that CEO got last week and dumped into the pot. It's none of your business to make sure that doesn't happen. The customer knows. It's consensual. Well maybe not. But none of your business. Literally.
The field of software engineering might be doomed if everyone worked like this user and replaced programmers with machines, or not, but those are sort of above his paygrade. AI destroying the symbiotic relationship between IT companies and its internal social clubs is a societal issue, more macro-scale issues than internal regulation mechanisms of free market economies are expected to solve.
I guess my point is, I don't know this guy or his company is real or not, but it passes my BS detector and I know for the fact that a real medium sized company CEOs are like this. This is technically what everyone should aspire to be. If you think that's morally wrong and completely utterly wrong, congratulations for your first job.
Turning this into a moral discussion is besides the point, a point that both of you missed in your efforts to be based, although the moral discussion is also interesting—but I'll leave that be for now. It appears as if I stepped on ArthurStack's toes, but I'll give you the benefit of the doubt and reply.
My point actually has everything to do with making money. Making money is not a viable differentiator in and of itself. You need to put in work on your desired outcomes (or get lucky, or both) and the money might follow. My problem is that directives such as "software developers need to use tool x" is an _input_ with, at best, a questionable causal relationship to outcome y.
It's not about "social clubs for software developers", but about clueless execs. Now, it's quite possible that he's put in that work and that the outcomes are attributable to that specific input, but judging by his replies here I wouldn't wager on it. Also, as others have said, if that's the case, replicating their business model just got a whole lot easier.
> This is technically what everyone should aspire to be
No, there are other values besides maximizing utility.
No, I think you're mistaking the host for the parasite - he's running a software and solutions company, which means, in a reductive sense, he is making money/scamming cash out of customers through means of software. The software is ultimately smoke and mirrors that can be anything so long it justify customer payments. Oh boy those software be additive to the world.
Everything between landing a contract and transferring deliverables, for someone like him, is already questionably related to revenues. There's everything in software engineering to tie developer paychecks to values created, and it's still as reliable as medical advice from LLM at best. Adding LLMs into it probably won't look so risky to him.
> No, there are other values besides maximizing utility.
True, but again, above his paygrade as a player in a free market capitalist economy which is mere part of a modern society, albeit not a tiny part.
----
OT and might be weird to say: I think a lot of businesses would appreciate vibe-coding going forward, relative to a team of competent engineers, solely because LLMs are more consistent(ly bad). Code quality doesn't matter but consistency do; McDonald's basically dominates Hamburger market with the worst burger ever that is also by far the most consistent. Nobody loves it, but it's what sells.
> My problem is that directives such as "software developers need to use tool x" is an _input_ with, at best, a questionable causal relationship to outcome y.
Total drivel. It is beyond question that the use of the tools increases the capabilities and output of every single developer in the company in whatever task they are working on, once they understand how to use them. That is why there is the directive.
it becomes a question of how much you believe it's all just training data, and how much you believe the LLM's got pieces that are composable. I've given the question on the link as an interview questions and had humans been unable to give as through an answer (which I chose to believe is due to specialization on elsewhere in the stack). So we're already at a place where some human software development abilities have been eclipsed on some questions. So then even if the underlying algorithms don't improve, and they just ingest more training data, then it doesn't seem like a total guess as to what part of the S-curve we're on - the number of questions for software development that LLMs are able to successfully answer will continue to increase.
Can you point to _any_ evidence to support that human software development abilities will be eclipsed by LLMs other than trying to predict which part of the S-curve we're on?