n=1 I’ve used Wikipedia for many years with no immediately noticeable false information. And of course all the “citation needed” marks are there. I trust Wikipedia to be correct, I expect it to be correct, and Wikipedia has earned my trust. Maybe I don’t read it enough to see any vandalism.
Compared to LLMs, it’s extremely striking to see the relative trust / faith people have in it. It’s pretty sad to see how little the average person values truth and correctness in these systems, how untrusted Wikipedia is to some, and how overly-trusted LLMs are in producing factually correct information to others.
No false information doesn't mean there isn't any bias. The same facts can be used to come to wildly different conclusions and can also just be omitted when inconvenient.
Humans writing the test first and LLM writing the code is much better than the reverse. And that is because tests are simply the “truth” and “intention” of the code as a contract.
When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.
Acceptance criteria is a human-readable text that the person specifying the software has to write to fill-up a field in Scrum tools and not at all guide the work of the developers.
It's usually derived from the description by an algorithm (that the person writing it has to run on their mind), and any deviation from that algorithm should make the person edit the description instead to make the deviation go away.
As in, a developer would write something in e.g. gherkin, and AI would automatically create the matching unit tests and the production code?
That would be interesting. Of course, gherkin tends to just be transpiled into generated code that is customized for the particular test, so I'm not sure how AI can really abstract it away too much.
All of this at the end reduces to a simple fact at the end of the discussion.
You need some of way of precisely telling AI what to do. As it turns out there is only that much you can do with text. Come to think of it, you can write a whole book about a scenery, and yet 100 people will imagine it quite differently. And still that actual photograph would be totally different compared to the imagination of all those 100 people.
As it turns out if you wish to describe something accurately enough, you have to write mathematical statements, in other words statements that reduce to true/false answers. We could skip to the end of the discussion here, and say you are better of either writing code directly or test cases.
This is just people revisiting logic programming all over again.
> You need some of way of precisely telling AI what to do.
I think this is the detail you are not getting quite right. The truth of the matter is that you don't need precision to get acceptable results, at least in 100% of the cases. As everything in software engineering, there is indeed "good enough".
Also worth noting, LLMs allow anyone to improve upon "good enough".
> As it turns out if you wish to describe something accurately enough, you have to write mathematical statements, in other words statements that reduce to true/false answers.
Not really. Nothing prevents you to refer to high-level sets of requirements. For example, if you tell a LLM "enforce Google's style guide", you don't have to concern yourself with how many spaces are in a tab. LLMs have been migrating towards instruction files and prompt files for a while, too.
Yes, you are right. But in the sense that a human decides if AI generated code is right.
But if you want a near 100% automation, you need precise way to specify what you want, else there is no reliable way interpreting what you mean. And by that definition lots of regression/breakage has to be endured everytime a release is made.
I’m talking higher level than that. Think about the acceptance criteria you would put in a user story. I’m specifically responding to this:
> When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.
You don’t need to personally write code that mechanically iterates over every possible state to remain in the driver’s seat. You need to describe the acceptance criteria.
The line you wrote does not describe a feature. Typically you have many of those cases and they collectively describe one feature. I’m talking about describing the feature. Do you seriously think there is no higher level than given/when/thens?
I'm curious what it could possibly be too. I guess he's trying to say the comments you might make at the top of a feature file to describe a feature would be his goal, but I'm not aware of a structured way to do that.
The problem is that tests are for the unhappy path just as much as the happy path, and unhappy paths tend to get particular and detailed, which means even in gherkin it can get cumbersome.
If AI is to handle production code, the unhappy paths need to at least be certain, even if repetitive.
I think your perspective is heavily influenced by the imperative paradigm where you actually write the state transition. Compare that to functional programming where you only describe the relation between the initial and final state. Or logic programming where you describe the properties of the final state and where it would find the elements with those properties in the initial state.
Those does not involves writing state transitions. You are merely describing the acceptance criteria. Imperative is the norm because that's how computers works, but there are other abstractions that maps more to how people thinks. Or how the problem is already solved.
I didn’t mention state transitions. When I said “mechanically iterate over every possible state”, I was referring to writing tests that cover every type of input and output.
Acceptance criteria might be something like “the user can enter their email address”.
Tests might cover what happens when the user enters an email address, what happens when the user tries to enter the empty string, what happens when the user tries to enter a non-email address, what happens when the user tries to enter more than one email address…
In order to be in the driver’s seat, you only need to define the acceptance criteria. You don’t need to write all the tests.
That only defines one of the things the user can enter. Should they be allowed to enter their postal address? Maybe. Should they be allowed to enter their friend's email address? Maybe.
> That would be interesting. Of course, gherkin tends to just be transpiled into generated code that is customized for the particular test, so I'm not sure how AI can really abstract it away too much.
I don't think that's how gherkin is used. Take for example Cucumber. Cucumber only uses it's feature files to specify which steps a test should execute, whereas steps are pretty vanilla JavaScript code.
In theory, nowadays all you need is a skeleton of your test project, including feature files specifying the scenarios you want to run, and prompt LLMs to fill in the steps required by your test scenarios.
You can also use a LLM to generate feature files, but if the goal is to specify requirements and have a test suite enforce them, implicitly the scenarios are the starting point.
Yes this is fundamental to actually designing software. Still, it would be perfectly reasonable to ask "please write a test which gives y output for x input".
So you actually thought you needed, in Python, classes with static methods instead of just plain old modules? What was your first Python "hello world" like?
An angry potential customer who demands one work for free is probably not the kind of business arrangement that most folks would find agreeable. I don’t know where these people get off, but they’re free riders on the information superhighway. If wishes were horses, beggars would ride.
That same person might have actually paid money if they weren’t (somewhat legitimately) lied to about it being free. Or just not gone.
Instead it’s the worst outcome for everyone, and everyone is angry and thinks each other are assholes. I guess that does sum up America the last few years eh?
The anger is misdirected, as it is a reaction to being confronted with one’s own ignorance and then shooting the messenger. In the hypothetical, that is. I don’t look at it as a lie exactly on the part of AI, but a failure of the user to actually check first party authoritative sources for that kind of info before showing up and acting entitled to a bill of goods you were never sold for any price. Even if it were free, you would still have to show up and claim a badge or check in, at which point they are free to present you with terms and conditions while you attend the event. I think the story says more about users and how they are marketed to than it does about AI and its capabilities. I think AI will probably get better faster than people will get used to the new post-AI normal, and maybe even after that. A lot of market participants seem to want that to happen, so some growing pains are normal with these kinds of disruptive technologies.
If somebody is a person who demands to get something for nothing from complete strangers and then get mad when they don't - well that person has very low value as a human until they can find enlightenment. These guys are definitely not in the majority of people.
There are reasonable reactions in this situation: Either be grateful that you got something for free, or accept that you were misinformed and pay what is asked, alternatively leave.
But let's be honest about this particular situation: The visitor had checked the event online, maybe first with ChatGPT and then on the official website. They noticed that the AI had made a mistake and thought they could abuse that to try to get in for free.
Everybody who works with the general public for restaurants, hospitality, events or retail recognize this kind of "customer", who are a small minority which you have to deal with sometimes. There are some base people who live their lives trying to find loop holes and ways to take advantage of others, while at the same time constantly being on the verge of a massive outrage over how unfairly they are being treated.
It is at fault for lying, but only a base person would go out in the real world and try to make other people responsible for the lies they were told by a robot.
ChatGPT pointed them at an authority figure who informed them of the situation from their point of view. Some folks don’t handle being corrected or being told that they are wrong or mistaken very well. I’m willing to let ChatGPT share some of the blame, but the human in the loop is determined to shirk all responsibility rightfully borne by them, so I’m less willing to give them any benefit of the doubt. I don’t doubt that they are being entirely unreasonable, so I don’t think their interpretation of events is relevant to how ChatGPT operates generally.
Unreasonable people are wrong to be unreasonable. This is not new. Technological solutions don’t map to problems of interpersonal relations neatly, as this example shows.
I switched from Python to JS for backend stuff a while back, thoroughly enjoying it. I agree that "Python installation and package management is broken," but the async stuff was the biggest improvement to productivity. Yes I know Python got asyncio, but there's a big difference between having one well-accepted way of doing things vs multiple competing, incompatible ways, where the good one has the least momentum.
The rest is small stuff that adds up like Py whitespace scoping, or Py imports somehow not taking relative paths, or JS object syntax is nicer: https://news.ycombinator.com/item?id=44544029
uv + ruff (for formatting AND linting) is a killer combo, though.
And the more you use uv, the most you discover incredible stuff you can do with it that kills so many python gotchas like using it in the shebang with inline deps or the wonders of "--with":
I coded happily in python for many years and fell out of love with it.
It doesn’t ship with a first party package manager so you got the community trying to fill this gap. Use any other language with good tooling like golang or rust and it is a breath of fresh air.
Python used as an actual PL is a footgun because it’s dynamic scripted. (Don’t tell me about using tools X, Y, Z, mypy, …) You essentially become the compiler checking types, solving basic syntax errors when uncommon paths are executed.
A programming language that’s only good for < 100 line scripts is not a good choice.
I honestly wish python were in a better state. I switched to rust.
> Don’t tell me about using tools X, Y, Z, mypy, …
What's wrong with using tools that improve on common issues? I don't think I'd use Python without them, but ruff and pyright make Python a very productive and reliable language if you're willing to fully buy into the static analysis.
> A programming language that’s only good for < 100 line scripts is not a good choice.
What a bunch of crap. It's so trivial to show very popular and useful programs written in Python that far exceed this number I'm not even going to do the work.
Hi rsyring, I made this comment out of experience.
As python projects grow and grow, you need to do lots of support work for testing and even syntactic correctness. This is automatic in compiled languages where a class of issues is caught early as compile errors, not runtime errors.
Personally I prefer to move more errors to compile time as much as possible. Dynamic languages are really powerful in what you can do at runtime, but that runtime flexibility trades off with compile time verification.
Of course, every project can be written in any language, with enough effort. The existence of large and successful python projects says nothing about the developer experience, developer efficiency, or fragility of the code.
All perfectly valid perspectives and I agree with most of what you wrote. But the comment above is pretty different from the tone/effort behind the comment I took issue with. :)
In hindsight, I should have just left it alone and not replied which is what I usually do. But Python's popularity isn't an aberration. It's tradeoffs make sense for a lot of people and projects. The low effort bad faith swipes at it from subsections of the HN community got me a bit riled today and I felt I had to say something. My apologies for a less than constructive critique of your comment.
> Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.
> Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.
> When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."
> Please don't fulminate. Please don't sneer, including at the rest of the community.
> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.
if you refused to learn excel during the PC revolution because you preferred doing the calculations by hand, you would have quickly found yourself unemployable.
Compared to LLMs, it’s extremely striking to see the relative trust / faith people have in it. It’s pretty sad to see how little the average person values truth and correctness in these systems, how untrusted Wikipedia is to some, and how overly-trusted LLMs are in producing factually correct information to others.
reply