More

petesergeant · 2025-06-12T10:09:00 1749722940

Randomly, my advice: don't sleep on this.

Three or four weeks ago I was posting how LLMs were useful for one-off questions but I wouldn't trust them on my codebase. Then I spent my week's holiday messing around on them for some personal projects. I am now a fairly committed Roo user. There are lots of problems, but there is incredible value here.

Try it and see if you're still a hold-out.

vultour · 2025-06-12T11:27:36 1749727656

I spent a good part of yesterday attempting to use ChatGPT to help me choose an appropriate API gateway. Over and over it suggested things that literally do not exist, and the only reason I could tell was that I spent a good amount of time in the actual documentation. This has been my experience roughly 80% of the time when trying to use an LLM. I would like to know what is the magical prompt engineering technique that makes it stop confidently hallucinating about literally everything.

BeetleB · 2025-06-12T17:00:09 1749747609

> I spent a good part of yesterday attempting to use ChatGPT to help me choose an appropriate API gateway.

If you mean the ChatGPT interface, I suspect you're headed in the wrong direction.

Try Aider, with API interface. You can use whatever model you like (as you're paying per token). See my other comment:

https://news.ycombinator.com/item?id=44259900

I mirror the GP's sentiment. My initial attempts using a chat like interface were poor. Then some months ago, due to many HN comments, I decided to give Aider a try. I had put my kid to bed and it was 10:45pm. My goal was "Let me just figure out how to install Aider and play with it for a few minutes - I'll do the real coding tomorrow." 15 minutes later, not only had I installed it, my script was done. There was one bug I had to fix myself. It was production quality code, too.

I was hooked. Even though I was done, I decided to add logging, command line arguments, etc. An hour later, it was a production grade script, with a very nice interface and excellent logging.

Oh, and this was a one-off script. I'll run it once and never again. Now all my one-off scripts have excellent logging, because it's almost free.

There was no going back. For small scripts that I've always wanted to write, AI is the way to go. That script had literally been in my head for years. It was not a challenging task - but it had always been low in my priority list. How many ideas do you have in your head that you'll never get around to because of lack of time. Well, now you can do 5x more of those than you would have without AI.

geoka9 · 2025-06-13T01:08:05 1749776885

Just wanted to add to your post with my anecdote.

I was at the "script epiphany" stage a few months ago and I got cool Bash scripts (with far more bells and whistles I would normally implement) just by iterating with Claude via its web interface.

Right now I'm at the "Gemini (with Aider) is pretty good for knock-offs of the already existing functionality" stage (in a Go/HTMX codebase).

I'm yet to get to the "wow, that thing can add brand new functionality using code I'm happy with just by clever context management and prompting" stage; but I'm definitely looking forward to it.

spacechild1 · 2025-06-12T12:56:53 1749733013

I'm having a very good experience with ChatGPT at the moment. I'm mostly using it for little tasks where I don't remember the exact library functions. Examples:

"C++ question: how do I get the unqualified local system time and turn into an ISO time string?"

"Python question: how do I serialize a C struct over a TCP socket with asyncio?"

"JS question: how do I dynamically show/hide an HTML element?" (I obviously don't write a lot of JS :-D)

ChatGPT gave me the correct answers on the first try. I have been a sceptic, but I'm now totally sold on AI assisted coding, at least as a replacement for Google and StackOverflow. For me there is no point anymore in wading through all the blog spam and SEO crap just to find a piece of information. Stack Overflow is still occasionally useful, but the writing is on the wall...

EDIT: Important caveat: stay critical! I have been playing around asking ChatGPT more complex questions where I actually know the correct answer resp. where I can immediately spot mistakes. It sometimes gives me answers that would look correct to a non-expert, but are hilariously wrong.

vultour · 2025-06-12T14:35:57 1749738957

The problem with this approach is that you might lose important context which is present in the documentation but doesn’t surface through the LLM. As an example, I just asked GPT-4o how to access Nth character in a string in Go. Predictably, it answered str[n]. This is a wildly dangerous suggestion because it works correctly for ASCII but not for other UTF8 characters. Sure, if you know about this and prompt it further it tells you about this limitation, but that’s not what 99% of people will do.

spacechild1 · 2025-06-12T19:58:40 1749758320

> The problem with this approach is that you might lose important context which is present in the documentation but doesn’t surface through the LLM.

Oh, I'm definitely aware of that! I mostly do this with things I have already done, but can't remember all the details. If the LLM shows me something new, I check the official documentation. I'm not into vibe coding :) I still want to understand every line of code I write.

simonw · 2025-06-12T13:29:01 1749734941

Which model did you use?

I find using o3 or o4-mini and prompting "use your search tool" works great for having it perform research tasks like this.

I don't trust GPT-4o to run searches.

Leynos · 2025-06-12T17:41:50 1749750110

Did you use search grounding? O3 or o4-mini-high with search grounding (which will usually come on by default with questions like this) are usually the best option.

yunwal · 2025-06-12T12:30:52 1749731452

Did you try giving it the docs to read?

petesergeant · 2025-06-12T12:49:38 1749732578

Sure, this was exactly how I felt three weeks ago, and I could have written that comment myself. The agentic approach where it works out it made something up by looking at the errors the type-check generates is what makes the difference.

phito · 2025-06-12T10:54:56 1749725696

I will definitely sleep on agents. Normal LLM use, fine, but I am not giving up reasoning.

BeetleB · 2025-06-12T17:04:20 1749747860

> Normal LLM use, fine, but I am not giving up reasoning.

Ouch! Reminds me of:

- I'm never going to use cell phones. I care about voice quality (me decades ago)

- I'm never going to use VoIP. I care about voice quality (everyone but me 2 decades ago).

- I'm never going to use a calculator. I am not going to give up on reasoning.

- I'm never going to let my kids play with <random other ethnicity>. I care about good manners.

https://en.wikipedia.org/wiki/False_dilemma

simonw · 2025-06-12T13:29:40 1749734980

What's your definition of "agents" there?

bananapub · 2025-06-12T11:15:17 1749726917

this is kind of a weird position to take. you're the captain, you're the person reviewing the code the LLM (agent or not) generates, you're the one asking for the code you want, you're in charge of deciding how much effort to put in to things, and especially which things are most worth your effort.

all this agent stuff sounded stupid to me until I tried it out in the last few weeks, and personally, it's been great - I give a not-that-detailed explanation for what I want, point it at the existing code and get back a patch to review once I'm done making my coffee. sometimes it's fine to just apply, sometimes I don't like a variable name or whatever, sometimes it doesn't fit in with the other stuff so I get it to try again, sometimes (<< 10% of the time) it's crap. the experience is pretty much like being a senior dev with a bunch of very eager juniors who read very fast.

anyway, obviously do whatever you want, but deriding something you've not looked in to isn't a hugely thoughtful process for adapting to a changing world.

phito · 2025-06-12T11:40:18 1749728418

If I have to review all code code it's writing, I'd rather write it myself (maybe with the help of an LLM).

> anyway, obviously do whatever you want, but deriding something you've not looked in to isn't a hugely thoughtful process for adapting to a changing world.

I have tried it. Not sure I want to be part of such world, unfortunately.

> the experience is pretty much like being a senior dev with a bunch of very eager juniors who read very fast.

I... don't want that. Juniors just slow me down because I have to check what they did and fix their mistakes.

(this is in the context of professional software development, not making scripts, tinkering etc)

BeetleB · 2025-06-12T17:08:31 1749748111

> I... don't want that. Juniors just slow me down because I have to check what they did and fix their mistakes.

> (this is in the context of professional software development, not making scripts, tinkering etc)

I understand the sentiment. A few months ago they wanted us to move fast and dumped us (originally 2 developers) with 4 new people who have very little real world coding experience. Not fun, and very stressful.

However, keep in mind that in many workplaces, handling junior devs poorly means one of two things:

1. If you have some abstruse domain expertise, and it's OK that only 1-2 people work on it, you'll be relegated to doing that. Sadly, most workplaces don't have such tasks.

2. You'll be fantastic in your output. Your managers will like you. But they will not promote you. After some point, they expect you to be a leverage multiplier - if you can get others to code really well, the overall team productivity will exceed that of any superstar (and no, I don't believe 10x programmers exist in most workplaces).

petesergeant · 2025-06-12T03:50:46 1749700246

I wonder if we'll start to see artisanal benchmarks. You -- and I -- have preferred models for certain tasks. There's a world in which we start to see how things score on the "simonw chattiness index", and come to rely on smaller more specific benchmarks I think

lhl · 2025-06-12T04:49:23 1749703763

Yeah, I think personalized evals will definitely be a thing. Besides reviewing way too much Arena, WildChat and having now seen lots of live traces firsthand, there's a wide range of LLM usage (and preferences), which really don't match my own tastes or requirements, lol.

For the past year or two, I've had my own personal 25 question vibe-check I've used on new models to kick the tires, but I think the future is something both a little more rigorous and a little more automated (something like LLM Jury w/ an UltraFeedback criteria based off of your own real world exchanges and then BTL ranked)? A future project...

HDThoreaun · 2025-06-12T15:51:28 1749743488

I think its more likely that we move away from benchmarks and towards more of a traditional reviewer model. People will find LLM influencers whose takes they agree with and follow them to keep up with new models.

petesergeant · 2025-06-12T03:45:07 1749699907

I am starting to feel like hallucination is a fundamentally unsolvable problem with the current architecture, and is going to keep squeezing the benchmarks until something changes.

At this point I don't need smarter general models for my work, I need models that don't hallucinate, that are faster/cheaper, and that have better taste in specific domains. I think that's where we're going to see improvements moving forward.

OccamsMirror · 2025-06-12T05:12:35 1749705155

If you could actually teach these models things, not just in the current context, but as temporal learning, then that would alleviate a lot of the issues of hallucination. I imagine being able to say "that method doesn't exist, don't recommend it again" and then give it the documentation and it would absorb that information permanently, that would fundamentally change how we interact with these models. But can that work for models hosted for everyone to use at once?

petesergeant · 2025-06-12T07:38:09 1749713889

There are an almost infinite number of things that can be hallucinated, though. You can't maintain a list of scientific papers or legal cases that don't exist! Hallucinations (almost certainly) aren't specific falsehoods that need to be erased...

jsjohnst · 2025-06-12T11:58:17 1749729497

The level of hallucinations with o3 are no different than the level of hallucinations from most (all?) human sources in my experience. Yes, you definitely need to cross check, but yes, you need to do that for literally everything else, so it feels a bit redundant to keep preaching that as if it’s a failing of the model and not just an inherent property of all free sharing of information between two parties.

varjag · 2025-06-12T10:11:59 1749723119

Hallucination rate from o3 onward appear to be very low, to the point I rarely have to check.

petesergeant · 2025-06-12T11:12:51 1749726771

This doesn't match my experience, so if I were you I'd absolutely keep checking.

petesergeant · 2025-06-12T03:36:47 1749699407

and sometimes before starting ... at two weeks' out, they'd had to change the salary offer (down) because they had screwed up the salary calculation, expressed surprise I'd said I planned to use the unlimited vacation policy to take a fixed four weeks a year (they felt it was a lot), changed the offer from employee to contractor, referred me to their accountant for what was really the simplest of accounting queries, sent me an equity calculator with an assumption of a $10bn sale price, and some other weird stuff. Really should have known better, and only lasted a few months -- my old company reached out to check I was happy in the new role, and had me back within a fortnight of checking in.

petesergeant · 2025-06-11T06:25:09 1749623109

Providers are exceptionally easy to switch. There's no moat for enterprise-level usage. There's no "market share" to gobble up because I can change a line in my config, run the eval suite, and switch immediately to another provider.

This is marginally less true for embedding models and things you've fine-tuned, but only marginally.

petesergeant · 2025-06-11T05:48:52 1749620932

I mean sure, but also, I feel like the ability to query an LLM for something is an invaluable resource I never had before and has made knowledge acquisition immeasurably easier for me. I definitely search the web much, much less when I'm trying to learn something.

petesergeant · 2025-06-10T11:14:28 1749554068

There is an infinite amount of programming to be done. There is not an infinite amount of lawyering to be done.

bufferoverflow · 2025-06-10T11:24:21 1749554661

Both statements are false, but even if they were true, if AI can do your job, and costs 1% of you, and works 100x faster, there's no reason to pay you.

hooverd · 2025-06-10T18:13:52 1749579232

I'm of the augmentation not automation camp. With AI you can very suddenly find you've 100x'd yourself up shit's creek legally.

defrost · 2025-06-10T11:25:09 1749554709

In my humble experience programming frequently leads to lawyering ..

As one grows, so too the other.

petesergeant · 2025-06-10T10:04:36 1749549876

The two top-level replies to this give some flavor if anyone is interested in the details: https://news.ycombinator.com/item?id=40661478

petesergeant · 2025-06-10T08:42:32 1749544952

It would be cool to have examples of what the distinct chunks each approach takes looks like. They should just be -- essentially -- paragraphs, right?

petesergeant · 2025-06-10T07:32:16 1749540736

> and force their untimely retirement

At what point do you think this becomes ridiculous? Like are we angry they're not still supporting PowerPC? Would three more years have made a difference to you? 5 more? 10 more? What's the magical number would have made you happy here?

vladvasiliu · 2025-06-10T08:47:47 1749545267

Maybe a given number of years isn't what the yardstick should be, but rather whether the hardware can still be reasonably used.

For example, I have a 3rd gen Intel Xeon that runs circles around regular newish processors in brute processing force (think compiling and such). Yet, MS doesn't officially support it anymore with win11. I know you can circumvent the TPM requirement, which I do, so I'm still using it, but this just shows how arbitrary this limit is.

In Apple's case, at least they can say it's a different architecture and whatnot.

myaccountonhn · 2025-06-10T08:27:09 1749544029

I think when you can't/won't anymore, make it open the hardware so enthusiasts can.

Netbsd manages to support PowerPC somehow... So yeah maybe they still should. They certainly have the money to do so.

ricardobeat · 2025-06-10T08:45:03 1749545103

Intel macs can run Windows on them (not that it would help) or Linux; a distro like Mint should have good support for most of the hardware, it actually runs better on older models. There is nothing Apple needs to do.

petesergeant · 2025-06-10T08:47:55 1749545275

Isn't Linux on Intel Macs largely a solved problem at this point?

fsflover · 2025-06-10T08:15:35 1749543335

The moment they stop support they should release all documentation for the hardware and let enthusiasts reuse it. Planned obsolescence and electronic waste could be avoided.

Nullabillity · 2025-06-10T07:53:43 1749542023

You're talking about a trillion dollar company, they could easily afford to keep going indefinitely.

forrestthewoods · 2025-06-10T07:59:30 1749542370

That’s not how it works. The cost to maintaining support for old hardware isn’t merely money for more engineers. It’s the opportunity cost of slowing down forward progress for new things.

Intel laptops are sooooo slow. So extremely painfully slow. They’re quite bad. I’m largely a windows users, but my god old Intel laptops are bloody awful. Leaving behind old and bad things isn’t bad.

Besides, an older Intel MacBook will continue to work in its current form. It doesn’t need another 10 years of updates.

kev009 · 2025-06-10T08:44:34 1749545074

They really aren't slow, but the performance and battery life is greatly eclipsed by Apple ARM. I could live pretty comfortably on a Lenovo P51 (something like a 2017 MBP) if I had to under Linux and FreeBSD. Also a not negligible amount of performance was lost by the security gaffes and microcode and OS mitigations for them.

mmcnl · 2025-06-10T10:42:15 1749552135

If they're too slow to your liking, then upgrade. But for many people the performance is acceptable.

forrestthewoods · 2025-06-10T15:45:49 1749570349

If they’re acceptable then keep using them as is.

icedchai · 2025-06-10T11:44:39 1749555879

If that's the case, Apple could easily offer a fantastic trade-in deal on existing Intel machines (say, bought in the last 5 years), to get people moved to Apple Silicon. Do you think they can't afford it? I feel bad for the person buying an Intel Mac Mini in 2022.

forrestthewoods · 2025-06-10T15:48:12 1749570492

Apple could afford to randomly select ten thousand people via lottery and give them a million dollars. Most American HN commenters could afford to donate 80% of their salary and still live a comfortable life. And a lot I bet could donate 90%.

A person or company being able to afford something is not a compelling argument.

icedchai · 2025-06-10T18:41:05 1749580865

I know, but Apple offers trade-in credit anyway when you "sell" your old laptop back to them. Offering an additional credit as good will to recent Intel owners would do nothing but help their reputation and get old systems off the street. It doesn't effect me... I migrated to Apple Silicon years ago.

forrestthewoods · 2025-06-10T19:36:45 1749584205

It’s just a simple value prop. Is the cost of increasing the payout worth the brand reputation?

Well we can say with confidence what Apple determined the answer to be. Only a few dorks on HN will care that a 5 year old laptop won’t get the new macOS update.

5 years is admittedly a bit short. But the M1 was a quite frankly revolutionary upgrade. So it’s a one off.

eviks · 2025-06-10T09:29:01 1749547741

> opportunity cost of slowing down forward progress for new things.

Which ones and how?