I feel like people in the comments are misunderstanding the findings in the arti...

01100011 · 2025-05-04T05:44:48 1746337488

The other night I was too tired to code so I decided to try vibe coding a test framework for the C/C++ API I help maintain. I've tried this a couple times so far with poor results but I wanted to try again. I used Claude 3.5 IIRC.

The AI was surprisingly good at filling in some holes in my specification. It generated a ton of valid C++ code that actually compiled(except it omitted the necessary #includes). I built and ran it and... the output was completely wrong.

OK, great. Now I have a few hundred lines of C++ I need to read through and completely understand to see why it's incorrect.

I don't think it will be a complete waste of time because the exercise spurred my thinking and showed me some interesting ways to solve the problem, but as far as saving me a bunch of time, no. In fact it may actually cost me more time trying to figure out what it's doing.

With all due respect to folks working on web and phone apps, I keep getting the feeling that AI is great for high level, routine sorts of problems and still mostly useless for systems programming.

sensanaty · 2025-05-04T11:17:13 1746357433

> With all due respect to folks working on web and phone apps, I keep getting the feeling that AI is great for high level, routine sorts of problems and still mostly useless for systems programming.

As one of those folks, no it's pretty bad in that world as well. For menial crap it's a great time saver, but I'd never in a million years do the "vibe coding" thing, especially not with user-facing things or especially not for tests. I don't mind it as a rubber duck though.

I think the problem is that there's 2 groups of users, the technical ones like us and then the managers and C-levels etc. They see it spit out a hundred lines of code in a second and as far as they know (and care) it looks good, not realizing that someone now has to spend their time reviewing the 100 lines of code, plus having the burden of maintenance of those 100 lines going into the future. But, all they see is a way to get the pesky, expensive devs replaced or at least a chance squeeze more out of them. The system is so flashy and impressive looking, and you can't even blame them for falling for the marketing and hype, after all that's what all the AIs are being sold as, omnipotent and omniscient worker replacers.

Watching my non-technical CEO "build" things with AI was enlightening. He prompts it for something fairly simple, like a TODO List application. What it spits out works for the most part, but the only real "testing" he does is clicking on things once or twice and he's done and satisfied, now convinced that AI can solve literally everything you throw at it.

However if he were testing the solution as a proper dev would, he'd see that the state updates break after a certain amount of clicks, and that the list was glitching out sometimes, and that adding things breaks on scroll and overflows the viewport, and so on. These are all real examples of an "app" he made by vibe coding, and after playing around with it myself for all of 3 minutes I noticed all these issues and more in his app.

01100011 · 2025-05-04T18:35:55 1746383755

You have my sympathy. At least in systems programming there is little desire of a manager to, idk, vibe code an adaptation layer for a new architecture or something.

sneak · 2025-05-04T11:55:45 1746359745

For esoteric config files (such as ntp or chrony) that would take me 10-15 mins to write and tweak, it gets done in seconds.

Over time, that adds up.

For simple utility programs and scripts, it also does a great job.

imiric · 2025-05-04T07:37:27 1746344247

> With all due respect to folks working on web and phone apps, I keep getting the feeling that AI is great for high level, routine sorts of problems and still mostly useless for systems programming.

As someone working on routine problems in mainstream languages where training data is abundant, LLMs are not even great for that. Sure, they can output a bunch of code really quickly that on the surface appears correct, but on closer inspection it often uses nonexistent APIs, the logic is subtly wrong or convoluted for no reason, it does things you didn't tell it to do or ignores things you did, it has security issues and other difficult to spot bugs, and so on.

The experience is pretty much what you summed up. I've also used Claude 3.5 the most, though all other SOTA model have the same issues.

From there, you can go into the loop of copy/pasting errors to the LLM or describing the issues you did see in the hopes that subsequent iterations will fix them, but this often results in more and different issues, and it's usually a complete waste of time.

You can also go in and fix the issues yourself, but if you're working with an unfamiliar API in an unfamiliar domain, then you still have to do the traditional task of reading the documentation and web searching, which defeats the purpose of using an LLM to begin with.

To be clear: I don't think LLMs are a useless technology. I've found them helpful at debugging specific issues, and implementing small and specific functionality (i.e. as a glorified autocomplete). But any attempts of implementing large chunks of functionality, having them follow specifications, etc., have resulted in much more time and effort spent on my part than if I had done the work the traditional way.

The idea of "vibe coding" seems completely unrealistic to me. I suspect that all developers doing this are not even checking whether the code does what they want to, let alone reviewing the code for any issues. As long as it compiles they consider it a success. Which is an insane way of working that will lead to a flood of buggy and incomplete applications, increasing the dissatisfaction of end users in our industry, and possibly causing larger effects not unlike the video game crash of 1983 or the dot-com bubble.

Jare · 2025-05-04T12:33:40 1746362020

> The idea of "vibe coding" seems completely unrealistic to me.

That's what happens to "AI art" too. Anyone as a non-artist can create images in seconds, and they will look kind of valid or even good to them, much like those "vibe coded" things look to CEOs.

AI is great at generating crap really fast and efficiently. Not so good at generating stuff that anyone actually needs and which must actually work. But we're also discovering that a lot of what we consume can be crap and be acceptable. An endless stream of generated synthwave in the background while I work is pretty decent. People wanting to decorate their podcasts or tiktoks with something that nobody is going to pay attention to, AI art can do that.

For vibe coding, right now it seems that prototyping and functional mockups seems to be quite a viable use.

sanderjd · 2025-05-04T16:37:21 1746376641

> You can also go in and fix the issues yourself, but if you're working with an unfamiliar API in an unfamiliar domain, then you still have to do the traditional task of reading the documentation and web searching, which defeats the purpose of using an LLM to begin with.

Oh, see, this is where I disagree. I think it's incredibly helpful to get past the "blank page". Yes, I do usually end up going and reading docs, but I also have a much better sense of what I'm looking for in the docs and can use them more effectively.

I feel like this is the same pattern with every new tool. Google didn't replace reference books, but it helped me discover the right ones to read much more easily. Similarly, LLM based tools are not replacing reference texts, but they're making it easier for me to spin up on new things; by the time I start reading the docs now, I'm usually past the point of needing to read the intro.

runeks · 2025-05-04T12:08:48 1746360528

> With all due respect to folks working on web and phone apps, I keep getting the feeling that AI is great for high level, routine sorts of problems and still mostly useless for systems programming.

I agree. AI is great for stuff that's hard to figure out but easy to verify.

For example, I wanted to know how to lay out something a certain way in SwiftUI and asked Gemini. I copied what it suggested, ran it and the layout was correct. I would have spent a lot more time searching and reading stuff compared to this.

MSFT_Edging · 2025-05-04T13:07:17 1746364037

I wish I had an ongoing counter for the amount of times I've asked chatgpt to "generate me python code that will output x data similar to xxd".

Its a snippet I've written a few times before to debug data streams, but it's always annoying to get alignment just right.

I feel like that is the sweet spot for AI, to generate actual snippets of routine code that has no bearing on security or functionality, but lets you keep thinking about the problem at hand while it does that 10 minutes of busy work.

sanderjd · 2025-05-04T12:38:02 1746362282

Yeah, I similarly have not had great success for creating entire systems / applications for exactly this reason. I have had no success at all in not needing to go in and understand what it wrote, and when I do that, I find it largely needs to be rewritten. But I have a lot more success when I'm integrating it into the work I'm doing.

I do know people who seem to be having more success with the "vibecoding" workflow on the front end though.

getnormality · 2025-05-04T15:53:08 1746373988

> OK, great. Now I have a few hundred lines of C++ I need to read through and completely understand to see why it's incorrect.

For a time, we can justify this kind of extra work by imagining that it is an upfront investment. I think that is what a lot of people are doing right now. It remains to be seen when AI-assisted labor is still a net positive after we stop giving it special grace as something that will pay off a lot later if we spend a lot of time on it now.

panstromek · 2025-05-04T07:18:43 1746343123

> OK, great. Now I have a few hundred lines of C++ I need to read through and completely understand to see why it's incorrect.

I think it's often better to just skip this and delete the code. The cool thing about those agents is that the cost of trying this out is extremely cheap, so you don't have to overthink it and if it looks incorrect, just revert it and try something else.

I've been experimenting with Junie for past few days, and had very positive experience. It wrote a bunch of tests for me that I've been postponing for quite some time it was mostly correct from a single sentence prompt. Sometimes it does something incorrect, but I usually just revert it and move on, try something else later. There's definitely a sweet spot for things tasks it does well and you have to experiment a bit to find it out.

delusional · 2025-05-04T06:46:01 1746341161

Personally, having worked in professional enterprise software for ~7 years now I've come to a pretty hard conclusion.

Most software should not exist.

That's not even meant in the tasteful "Its a mess" way. From a purely money making efficiency standpoint upwards of 90% of the code I've written in this time has not meaningfully contributed back to the enterprise, and I've tried really hard to get that number lower. Mind you, this is professional software. If you consider the vibe coder guys, I'll estimate that number MUCH higher.

It just feels like the whole way we've fit computing into the world is misaligned. We spent days building UIs that dont help the people we serve and that break at the first change to the process, and because of the support burden of that UI we never get to actually automate anything.

I still think computers are very useful to humanity, but we have forgot how to use them.

SideburnsOfDoom · 2025-05-04T07:58:21 1746345501

> Upwards of 90% ... of software should not exist ... it has not meaningfully contributed back to the enterprise

This is Sturgeon's law. (1)

And yes, but it's hard or impossible to identify the useful 10% ahead of time. It emerges after the fact.

1) https://en.wikipedia.org/wiki/Sturgeon%27s_law

thyrsus · 2025-05-04T07:03:20 1746342200

And not only that, but most >>changes<< to software shouldn't happen, especially if it's user facing. Half my dread in visiting support web sites is that they've completely rearranged yet again, and the same thing I've wanted five times requires a fifth 30 minutes figuring out where they put it.

nyarlathotep_ · 2025-05-04T14:32:34 1746369154

> "Personally, having worked in professional enterprise software for ~7 years now I've come to a pretty hard conclusion.

Most software should not exist.

That's not even meant in the tasteful "Its a mess" way. From a purely money making efficiency standpoint upwards of 90% of the code I've written in this time has not meaningfully contributed back to the enterprise, and I've tried really hard to get that number lower. Mind you, this is professional software. If you consider the vibe coder guys, I'll estimate that number MUCH higher."

I've worked on countless projects at this point that seemed to serve no purpose, even at the outset, and had no plan to even project cost savings/profit, except, at best some hand-waving approximation.

Even worse, many companies are completely uninterested in even conceptualizing operating costs for a given solution. They get sold on some cloud thing cause "OpEx" or whatever, and then spend 100s of hours a month troubleshooting intricate convoluted architectures that accomplish nothing more than a simple relational database and web server would.

Sure, the cloud bill is a lower number, but if your staff is burning hours every week fighting `npm audit` issues, and digging through CloudWatch for errors between 13 Lambda functions, what did you "save"?

I've even worked on more than one project that existed specifically to remove manual processes (think printing and inspecting documents) to "save time." Sure, now shop floor workers/assembly workers inspect less papers manually, but now you need a whole other growth of technical staff to troubleshoot crap constantly.

Oh and the company(ies) don't have in-house staff to maintain the thing, and have no interest in actually hiring so they write huge checks to a consulting company to "maintain" the stuff at a cost often orders of magnitude higher than it'd cost to hire staff that would actually own the project(s). And these people have a conflict of interest to maximize profit, so they want to "fix" things and etc etc.

I think a lot of this is the outgrowth of the 2010s where every company was going to be a "tech company" and cargo-culted processes without understanding the purpose or rationale, and lacking competent people to properly scope and deliver solutions that work, are on time and under budget, and tangibly deliver value.

acedTrex · 2025-05-04T03:39:21 1746329961

> where the world’s best programmers and technologists are tied up fiddling with transformers and datasets and evals so that the world’s worst programmers can slap together temperature converters and insecure twitter clones

This statement is incredibly accurate

chii · 2025-05-04T05:38:55 1746337135

> slap together temperature converters and insecure twitter clones

because those "best programmers" don't want to be making temperature converters nor twitter clones (unless they're paid mega bucks). This enables the low paid "worst" programmers to do those jobs for peanuts.

It's an acceptable outcome imho.

lionkor · 2025-05-04T10:24:39 1746354279

Let's assume that I'm closer to best programmers than worst programmers, for a second; I definitely will build a temperature converter, at my usual hourly rate. I don't think we should consider any task "beneath us", doing so detaches us from reality, makes us entitled, and ultimately stumps our growth

delusional · 2025-05-04T07:01:43 1746342103

But do we actually need more temperature converters? Maybe it would be better if they were hard to make such that people didn't waste their time, and the bad programmers went out and did some yard work.

walleeee · 2025-05-04T12:46:04 1746362764

I think it might be more urgent for the star-AI'd (ha) programmers to go out and touch grass. Do we really need more people in that pile right now? There is a lot of mundane but interesting/challenging work out there, humming along beneath the hype cycle of the day. It may not pay 300k or satisfy utopian urges, but then again, you should probably be suspicious if someone hands you fistfuls of money and tells you you're saving the world

sanderjd · 2025-05-04T12:35:14 1746362114

I think the software quality nosedive significantly predates generative AI.

I think it's too early to say whether AI is exacerbating the problem (though I'm sympathetic to the view that it is) or improving it, or just maintaining the status quo.

Ancalagon · 2025-05-04T23:14:43 1746400483

Imho it’s going to worsen things unless the models and their toolsets significantly improve

Barrin92 · 2025-05-04T08:40:51 1746348051

>it’s that perceived savings from using AI are nullified by new work which is created by the usage of AI:

I mean, isn't that obvious looking at economic output and growth? The Shopify CEO recently published a memo in which he claimed that high achievers saw "100x growth". Odd that this isn't visible in the Spotify market cap. Did they fire 99% of their engineers instead? Maybe the memo was AI written too.

Are there any 5 man software companies that do the work of 50? I haven't seen them. I wonder how long this can go on with the real world macro data so divorced from what people have talked themselves into.

ausbah · 2025-05-04T02:10:34 1746324634

the state of consumer software is already so bad & LLMs are trained on a good chunk of that so their output can possible produce worse software right? /s