> And for me, past a certain point, even if you continually report back problems...

the_sleaze_ · 2025-05-06T03:50:26 1746503426

This is my experience as well, as has been for over a year now.

LLMs are so incredibly transformative when they're incredibly transformative. And when they aren't it's much better to fall back on the years of hard won experience I have - the sooner the better. For example I'll switch between projects and languages and even with explicit instruction to move to a strongly typed language they'll stick to dynamic answers. It's an odd experience to re-find my skills every once in a while. "Oh yeah, I'm pretty good at reading docs myself".

With all the incredible leaps in LLMs being reported (especially here on HN) I really haven't seen much of a difference in quite a while.

PantaloonFlames · 2025-05-06T14:13:57 1746540837

Interesting. This is another problem aider does not experience. It works on a git repo. If you switch repos, it changes context.

I’m not affiliated with aider. I just use it.

I have a bet that many of the pitfalls people experience at the moment are due to mismatched tools or immature tools.

emporas · 2025-05-06T05:57:08 1746511028

In other words don't use the context window. Treat it as a command line with input/output, in which the purpose of the command is to extract information signal or knowledge manipulation or data mining and so on.

Also special care has to be given to the number of tokens. Even with one-question/one-answer, 5 hundred to 1 thousand tokens can be focused at once by our artificial overlords. After that they start losing their marbles. There are exceptions to that rule with the reasoning models, but in essence they are not that different.

The difference of using the tool correctly versus not, might be that instead of getting 99.9% accuracy, the user gets just 98%. Probably that doesn't sound that big of a difference to some people. The difference is that it works 10 times better in the first case.

namaria · 2025-05-06T08:40:33 1746520833

People keep throwing these 95%+ accuracy rates for LLMs in these discussions, but that is nonsense. It's closer to 70%. It's quite terrible. I use LLMs but I never trust them beyond just doing some initial search if I am stumped, and when it unblocks me I immediately put it down again. It's not transformative, it's merely replacing google because search there has sucked for a while.

diggan · 2025-05-06T11:38:52 1746531532

95% accuracy VS 70% accuracy, both numbers are pulled out of someone's ass and serves little to the discussion at hand. How did you measure that, or rather since you didn't, what's the point of sharing this hypothetical 25% difference?

And how funny that you comment seems to land perfectly together with this about people having very different experiences with using LLMs:

> I am still trying to sort out why experiences are so divergent. I've had much more positive LLM experiences while coding than many other people seem to, even as someone who's deeply skeptical of what's being promised about them. I don't know how to reconcile the two.

https://news.ycombinator.com/item?id=43898532

emporas · 2025-05-06T15:50:39 1746546639

It works very well (99.9%), when the problem resides at a familiar territory of the user's knowledge. When i know enough about a problem, i know how to decompose it into smaller pieces, and all (most?) smaller pieces have been already solved countless of times.

When a problem is far outside of my understanding, A.I. leads me towards a wrong path more often than not. Accuracy is terrible, because i don't know how to decompose the problem.

Jargon plays a crucial role there. LLM's need to guided using as much correct jargon of the problem as possible.

I have done this for decades on people. I read a book at some point that the most sure way for people to like you, is to speak to them in words they usually use themselves. No matter the concepts they are hearing with their ears, if the words belong belong in their familiar vocabulary they are more than happy to discuss anything.

So when i meet someone, i always try to absorb as much of their vocabulary as possible, as quickly as possible, and then i use it to describe ideas i am interested in. People understand much better like that.

Anyway, the same holds true for LLM's, they need to hear the words of the problem, expressed in that particular jargon. So when a programmer wants to use a library, he needs to absorb the jargon used in that particular library. It is only then that accuracy rates hit many nines.

namaria · 2025-05-06T13:19:43 1746537583

I will walk around the gratuitous rudeness and state the obvious:

No, the pretend above 95% accuracy is not as good as the up to 50% rates of hallucinations reported by OpenAI for example.

The difference in experiences is easily explainable in my opinion. Much like some people swear by mediums and psychics and other easily see through it: it's easy to see what you want to see when a nearly random experience lands you a good outcome.

I don't appreciate your insinuation that I am making up numbers and I though it shouldn't go unanswered but do not mistake this for a conversation. I am not in the habit of engaging with such demeaning language.

diggan · 2025-05-06T14:39:41 1746542381

> gratuitous rudeness

It is "Gratuitous rudeness" to say these numbers without any sort of sourcing/backing are pulled from someone's ass? Then I guess so be it, but I'm also not a fan of people speaking about absolute numbers as some sort of truth, when there isn't any clear way of coming up with those numbers in the first place.

Just like there are "extremists" claiming LLMs will save us all, clearly others fall on the other extreme and it's impossible to have a somewhat balanced conversation with either of these two groups.

thewebguyd · 2025-05-06T15:55:49 1746546949

This has largely been my experience as well, at least with GH Copilot. I mostly use it as a better Google now, because even with context of my existing code, it can't adhere to style at all. Hell, it can't even get docker compose files right, using various versions and incorrect parameters all the time.

I also noticed language matters a lot as well. It's pretty good with Python, pandas, matplotlib, etc. But ask it to write some PowerShell and it regularly hallucinates modules that don't exist, more than any other language I've tried to use it with.

And good luck if you're working with a stack that's not flavor of the month with plenty of online information available. ERP systems with its documentation living behind a paywall, so it's not in the training data - you know, real world enterprise CRUD use cases where I'd want to use it the most, it's the least helpful.

namaria · 2025-05-07T09:21:29 1746609689

To be fair, I find ChatGPT useful for Elixir, which is pretty niche. The great error messages (if a bit verbose) and the atomic nature of functions in a functional language goes with the grain of LLMs I think.

Still, at most I get it to help me with snippets. I wouldn't want it to just generate lots of code, for one it's pretty easy to write Elixir...

tstrimple · 2025-05-06T06:59:16 1746514756

I think “don’t use the context window” might be too simple. It can be incredibly useful. But avoid getting in a context window loop. When iterations stop showing useful progress toward the goal, it’s time to abandon the context. LLMs tend to circle back to the same dead end solution path at some point. It also helps to jump between LLMs to get a feel for how they perform on different problem spaces.

emporas · 2025-05-06T07:18:50 1746515930

Depends on the use case. For programming where every small detail might have huge implications, 98% accuracy vs 99.99% is a ginormous difference.

Other tasks can be more forgiving, like writing which i do all the time, then i load 3000 tokens in the context window pretty frequently. Small details in the accuracy do not matter so much for most people, for everyday casual tasks like rewriting text, summarizing etc.

In general, be wary of how much context you load into the chat, performance degrades faster than you can imagine. Ok, the aphorism i started with, was a little simplistic.

tstrimple · 2025-05-06T07:38:49 1746517129

Oh sure. Context windows have been less useful for me on programming tasks than other things. When working iteratively against a CSV file for instance, it can be very useful. I’ve used something very similar to the following before:

“Okay, now add another column which is the price of the VM based on current Azure costs and the CPU and Memory requirements listed.”

“This seems to only use a few Azure VM SKU. Use the full list.”

“Can you remove the burstable SKU?”

Though I will say simple error fixes with context windows with programming issues are resolved fine. On more than one occasion when I copy and paste an incorrect solution. Providing the LLM with the error is enough to fix the problem. But if it goes beyond that, it’s best to abandon the context.

PantaloonFlames · 2025-05-06T14:09:26 1746540566

Aider, the tool, does exactly the opposite, in my experience.

It really works, for me. It iterates by itself and fixes the problem.

diggan · 2025-05-06T14:36:41 1746542201

I'm in the middle of test-driving Aider and I'm seeing exactly the same problem, the longer a conversation goes on, the worse the quality of replies... Currently, I'm doing something like this to prevent it from loading previous context:

    rm -r .aider.chat.history.md .aider.tags.cache.v4 || true && aider --architect --model deepseek/deepseek-reasoner --editor-model deepseek/deepseek-chat --no-restore-chat-history

Which clears the history, then I basically re-run that whenever the model gets something wrong (so I can update/improve the prompt and try again).

mikeqq2024 · 2025-05-09T03:55:49 1746762949

why not use /clear and /reset command?