This is just semantics. What's the difference between a "human interpretation of...

otabdeveloper4 · 2025-06-03T05:23:50 1748928230

> What's the difference between a "human interpretation of a good program" and a "good program" when we (humans) are the ones using it?

Correctness.

> and meets my requirements

It can't do that. "My requirements" wasn't part of the training set.

mindwok · 2025-06-03T05:55:50 1748930150

"Correctness" in what sense? It sounds like it's being expanded to an abstract academic definition here. For practical purposes, correct means whatever the person using it deems to be correct.

> It can't do that. "My requirements" wasn't part of the training set.

Neither are mine, the art of building these models is that they are generalisable enough that they can tackle tasks that aren't in their dataset. They have proven, at least for some classes of tasks, they can do exactly that.

godelski · 2025-06-03T06:31:21 1748932281

  > to an abstract academic definition here

Besides the fact that your statement is self contradicting, there is actually a solid definition [0]. You should click the link on specification too. Or better yet, go talk to one of those guys that did their PhD in programming languages.

  > They have proven

Have they?

Or did you just assume?

Yeah, I know they got good scores on those benchmarks but did you look at the benchmarks? Look at the question and look what is required to pass it. Then take a moment and think. For the love of God, take a moment and think about how you can pass those tests. Don't just take a pass at face value and move on. If you do, well I got a bridge to sell you.

[0] https://en.wikipedia.org/wiki/Correctness_(computer_science)

mindwok · 2025-06-03T07:32:43 1748935963

Sure,

> In theoretical computer science, an algorithm is correct with respect to a specification if it behaves as specified.

"As specified" here being the key phrase. This is defined however you want, and ranges from a person saying "yep, behaves as specified", to a formal proof. Modern language language models are trained under RL for both sides of this spectrum, from "Hey man looks good", to formal theorem proving. See https://arxiv.org/html/2502.08908v1.

So I'll return to my original point: LLMs are not just generating outputs that look plausible, they are generating outputs that satisfy (or at least attempt to satisfy) lots of different objectives across a wide range of requirements. They are explicitly trained to do this.

So while you argue over the semantics of "correctness", the rest of us will be building stuff with LLMs that is actually useful and fun.

godelski · 2025-06-03T09:49:11 1748944151

You have to actually read more than the first line of a Wikipedia article to understand it

  > formal theorem proving

You're using Coq and Lean?

I'm actually not convinced you read the paper. It doesn't have anything to do with your argument. Someone using LLMs with formal verification systems is wildly different than LLMs being formal verification systems.

This really can't work if you don't read your own sources

otabdeveloper4 · 2025-06-03T08:40:15 1748940015

> they are generating outputs that satisfy (or at least attempt to satisfy) lots of different objectives across a wide range of requirements

No they aren't. You were lied to by the hype machine industry. Sorry.

The good news is that there's a lot of formerly intractable problems that can now be solved by generating plausible output. Programming is just not one of them.

mindwok · 2025-06-03T10:12:38 1748945558

> No they aren't. You were lied to by the hype machine industry. Sorry.

Ok. My own empirical evidence is in favour of these things being useful, and useful enough to sell their output (partly), but I'll keep in mind that I'm being lied to.

otabdeveloper4 · 2025-06-03T10:20:39 1748946039

Quite a huge leap from "these things are useful" to "these things can code".

(And yes, this leap is the lie you're being sold. "LLMs are kinda useful" is not what led to the LLM trillion dollar hype bubble.)

mindwok · 2025-06-03T11:15:43 1748949343

The thing I'm using them for is coding though...

godelski · 2025-06-03T06:25:20 1748931920

Is your grandma qualified to determine what is good code?

  > If the model can write code that passes tests

You think tests make code good? Oh my sweet summer child. TDD has been tried many times and each time it failed worse than the last.

pydry · 2025-06-03T08:00:51 1748937651

Good to know something i've been doing for 10 years consistently could never work.

godelski · 2025-06-03T09:58:49 1748944729

It's okay, lots of people's code is always buggy. I know people that suck at coding and have been doing it for 50 years. It's not uncommon

I'm not saying don't make tests. But I am saying you're not omniscient. Until you are, your tests are going to be incomplete. They are helpful guides, but they should not drive development. If you really think you can test for every bug then I suggest you apply to be Secretary for health.

https://hackernoon.com/test-driven-development-is-fundamenta...

https://geometrian.com/projects/blog/test_driven_development...

Dylan16807 · 2025-06-03T20:14:06 1748981646

> It's okay, lots of people's code is always buggy. I know people that suck at coding and have been doing it for 50 years. It's not uncommon

Are you saying you're better than that? If you think you're next to perfect then I understand why you're so against the idea that an imperfect LLM could still generate pretty good code. But also you're wrong if you think you're next to perfect.

If you're not being super haughty, then I don't understand your complaints against LLMs. You seem to be arguing they're not useful because they make mistakes. But humans make mistakes while being useful. If the rate is below some line, isn't the output still good?

pydry · 2025-06-03T11:08:05 1748948885

Ive worked with people who write tests afterwards on production code and it's pretty inevitable that they:

* End up missing tests for edge cases they built and forgot about. Those edge cases often have bugs.

* They forget and cover the same edge cases twice if theyre being thorough with test-after. This is a waste.

* They usually end up spending almost as much time manually testing in the end to verify the code change they just made worked whereas I would typically just deploy straight to prod.

It doesnt prevent all bugs it just prevents enough to make the teams around us who dont do it look bad by comparison even though they do manual checks too.

Ive heard loads of good reasons to not write tests at all, Ive yet to hear a good reason to not write one before if you are going to write one.

Both of your articles raise pretty typical straw men. One is "what if im not sure what the customer wants?" (thats fine but i hope you arent writing production code at this point) and the other is the peculiar but common notion that TDD can only be done with a low level unit test which is dangerous bullshit.

godelski · 2025-06-03T21:59:32 1748987972

Sure, you work with some bad programmers. Don't we all?

The average driver thinks they're above average. The same is true about programmers.

I do disagree a bit with the post and think you should write tests while developing. Honestly, I don't think they'll disagree. I believe they're talking about a task rather than the whole program. Frankly, no program is ever finished so in that case you'd never write tests lol.

I believe this because they start off saying it wasn't much code.

But you are missing the point. From the first link

  > | when the tests all pass, you’re done
  > Every TDD advocate I have ever met has repeated this verbatim, with the same hollow-eyed conviction.

These aren't strawmen. These are questions you need to constantly be asking yourself. The only way to write good code is to doubt yourself. To second guess. Because that's what drives writing better tests.

I actually don't think you disagree. You seem to perfectly understand that tests (just like any other measure) are guides, not answers. That there's much more to this than passing tests.

But the second D in TDD is what's the problem. Tests shouldn't drive development, they are just part of development. The engineer writing tests at the end is inefficient, but the engineer that writes tests at the beginning is arrogant. To think you can figure it out before writing the code is laughable. Maybe some high level broad tests are feasible but that's only going to be a very small portion.

You can do hypothesis driven development, but people will call you a perfectionist and say you're going to slow. By HDD I mean you ask "what needs to happen, how would I know that is happening?" Which very well might involve creating tests. Any scientist is familiar with this but also familiar with its limits

pydry · 2025-06-03T22:39:54 1748990394

TDD is not a panacea, it's an effective, pragmatic practice with several benefits and little to no downsides compared to test after.

Im not sure what you're saying, really but I dont think it disagrees with this central point in any specific way.