> On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
This has been an obviously absurd question for two centuries now. Turns out the people asking that question were just visionaries ahead of their time.
It is kind of impressive how I'll ask for some code in the dumbest, vaguest, sometimes even wrong way, but so long as I have the proper context built up, I can get something pretty close to what I actually wanted. Though I still have problems where I can ask as precisely as possible and get things not even close to what I'm looking for.
> This has been an obviously absurd question for two centuries now. Turns out the people asking that question were just visionaries ahead of their time.
This is not the point of that Babbage quote, and no, LLMs have not solved it, because it cannot be solved, because "garbage in, garbage out" is a fundamental observation of the limits of logic itself, having more to with the laws of thermodynamics than it does with programming. The output of a logical process cannot be more accurate than the inputs to that process; you cannot conjure information out of the ether. The LLM isn't the logical process in this analogy, it's one of the inputs.
At a fundamental level, yes, and even in human-to-human interaction this kind of thing happens all the time. The difference is that humans are generally quite good at resolving most ambiguities and contradictions in a request correctly and implicitly (sometimes surprisingly bad at doing so explicitly!). Which is why human language tends to be more flexible and expressive than programming languages (but bad at precision). LLMs basically can do some of the same thing, so you don't need to specify all the 'obvious' implicit details.
The Babbage anecdote isn't about ambiguous inputs, it's about wrong inputs. Imagine wanting to know the answer to 2+2, so you go up to the machine and ask "What is 3+3?", expecting that it will tell you what 2+2 is.
Adding an LLM as input to this process (along with an implicit acknowledgement that you're uncertain about your inputs) might produce a response "Are you sure you didn't mean to ask what 2+2 is?", but that's because the LLM is a big ball of likelihoods and it's more common to ask for 2+2 than for 3+3. But it's not magic; the LLM cannot operate on information that it was not given, rather it's that a lot of the information that it has was given to it during training. It's no more a breakthrough of fundamental logic than Google showing you results for "air fryer" when you type in "air frier".
I think the point they’re making is that computers have traditionally operated with an extremely low tolerance for errors in the input, where even minor ambiguities that are trivially resolved by humans by inferring from context can cause vastly wrong results.
We’ve added context, and that feels a bit like magic coming from the old ways. But the point isn’t that there is suddenly something magical, but rather that the capacity for deciphering complicated context clues is suddenly there.
> computers have traditionally operated with an extremely low tolerance for errors in the input
That's because someone have gone out of their way to mark those inputs as errors because they make no sense. The CPU itself has no qualms doing 'A' + 10 because what it's actually sees is a request is 01000001 (65) as 00001010 (10) as the input for its 8 bit adder circuit. Which will output 01001011 (75) which will be displayed as 75 or 'k' or whatever depending on the code afterwards. But generally, the operation is nonsense, so someone will mark it as an error somewhere.
So errors are a way to let you know that what you're asking is nonsense according to the rules of the software. Like removing a file you do not own. Or accessing a web page that does not exists. But as you've said, we can now rely on more accurate heuristics to propose alternatives solution. But the issue is when the machine goes off and actually compute the wrong information.
Handing an LLM a file and asking it to extract data out of it with no further context or explanation of what I'm looking for with good results does feel a bit like the future. I still do add context just to get more consistent results, but it's neat that LLMs handle fuzzy queries as well as they do.
We wanted to check the clock at the wrong time but read the correct time. Since a broken clock is right twice a day, we broke the clock, which solves our problem some of the time!
A clock that's 5 seconds, 5 minutes, or 5 hours ahead, or counts an hour as 61 minutes, is still more useful than a clock that does not move it's hands at all.
It's very impressive that I can type misheard song lyrics into Google, and yet still have the right song pop up.
But, having taken a chance to look at the raw queries people type into apps, I'm afraid neither machine nor human is going to make sense of a lot of it.
Well, you can enter 4-5 relatively vague keywords into google and first/second stackoverflow link will probably provide plenty of relevant code. Given that, its much less impressive since >95% of the problems and queries just keep repeating.
This is a really hard problem when I write every line and have the whole call graph in my head. I have no clue how you think this gets easier by knowing less about the code
No one is saying you shouldn't write tests. But we are saying TDD is dumb.
Actually, for exactly the reasons you mention: I'm not dumb enough to believe I'm a genius. I'll always miss something. So I can't rely on my tests to ensure correctness. It takes deeper thought and careful design.
By using the program? Mind you this works only for _personal_ tools where it’s intuitively obvious when something is wrong.
For example
”Please create a viewer for geojson where i can select individual feature polygons and then have button ’export’ that exports the selected features to a new geojson”
1. You run it
2. It shows the json and visualizes selections
3. The exported subset looks good
I have no idea how anyone could keep the callgraph of even a minimal gui application in their head. If you can then congratulations, not all of us can!
Great, I used my program and everything seems to be working as expected.
Not great, somebody else used my program and they got root on my server...
> I have no idea how anyone could keep the callgraph of even a minimal gui application in their head
Practice.
Lots and lots of practice.
Write it down. Do things the hard way. Build the diagrams by hand and make sure you know what's going on. Trace programs. Pull out the debugger! Pull out the profiler!
If you do those things, you too will gain that skill. Obviously you can't do this for a giant program but it is all about the resolution of your call graph anyways.
If you are junior, this is the most important time to put in that work. You will get far more from it than you lose. If you're further along, well the second best time to plant a tree is today.
”not great, somebody else used my program and they got root on my server...”
In general security sensitive software is the worst place possible to use LLM:s based on public case studies and anecdata exactly for this reason.
”Do it the hard way”
Yes that’s generally the way I do it as well when I need to reliably understand something but it takes hours.
The cadence with LLM driven experiments is usually under an hour. That’s the biggest boom for me - I get a new tool and can focus on the actual work I’m delivering, with some step now taking slightly less time.
For example I’m happy using vim without ever having read the code or debugged it, much less having observed it’s callgraph. I’m similarly content in using LLM generated utilities without much oversight. I would never push code like that to production of course.
how do you know what you want if you didn't write a test for it?
I'm afraid what you want is often totally unclear until you start to use a program and realize that what you want is either what the program is doing, or it isn't and you change the program.
MANY programs are made this way, I would argue all of them actually. Some of the behaviour of the program wasn't imagined by the person making it, yet it is inside the code... it is discovered, as bugs, as hidden features, etc.
Why are programmers so obsessed that not knowing every part of the way a program runs means we can't use the program? I would argue you already don't, or you are writing programs that are so fundamentally trivial as to be useless anyway.
LLM written code is just a new abstraction layer, like Python, C, Assembly and Machine Code before it... the prompts are now the code. Get over it.
> how do you know what you want if you didn't write a test for it?
You have that backwards.
How do you know what to test if you don't know what you want?
I agree with you though, you don't always know what you want when you set out. You can't just factorize your larger goal into unit tests. That's my entire point.
You factorize by exploration. By play. By "fuck around and find out". You have to discover the factorization.
And that, is a very different paradigm than TDD. Both will end with tests, and frankly, the non TDD paradigm will likely end up with more tests with better coverage.
> Why are programmers so obsessed that not knowing every part of the way a program runs means we can't use the program?
I think you misunderstand. I want to compare it to something else. There's a common saying "don't let perfection be the enemy of good (enough)". I think it captures what you're getting at, or is close enough.
The problem with that saying is that most people don't believe in perfection[0]. The problem is, perfection doesn't exist. So the saying ends up being a lazy thought terminator instead of addressing the real problem: determining what is good enough.
In fact, no one knows every part of even a trivial program. We can always introduce more depth and complexity until we reach the limits of our physics models and so no one knows. Therefore, you'll have to reason it is not about perfection.
I think you are forgetting why we program in the first place. Why we don't just use natural language. It's the same reason we use math in science. Not because math is the language of the universe but rather that math provides enough specificity to be very useful in describing the universe.
This isn't about abstraction. This is about specification.
It's the same problem with where you started. The customer can't tell my boss their exact requirements and my boss can't perfectly communicate to me. Someone somewhere needs to know a fair amount of details and that someone needs to be very trustworthy.
I'll get over it when the alignment problem is solved to a satisfactory degree. Perfection isn't needed, we will have you discuss what is good enough and what is not
[0] likely juniors. And it should be beat out of them. Kindly
You don't hear the complaints. That's different than no complaints. Trust me, they got them.
I got plenty of complaints for Apple, Google, Netflix, and everyone else. Shit that can be fixed with just a fucking regex. Here's an example: my gf is duplicated in my Apple contacts. It can't find the duplicate, despite same name, nickname, phone number, email, and birthday. Which there's three entries on my calendar for her birthday. Guess what happened when I manually merged? She now has 4(!!!!!) entries!! How the fuck does that increase!
> On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
This has been an obviously absurd question for two centuries now. Turns out the people asking that question were just visionaries ahead of their time.
It is kind of impressive how I'll ask for some code in the dumbest, vaguest, sometimes even wrong way, but so long as I have the proper context built up, I can get something pretty close to what I actually wanted. Though I still have problems where I can ask as precisely as possible and get things not even close to what I'm looking for.