Vanilla OS is crap. I wanted to try in in a VM and it wouldn't even install. First, it complained that the disk size wasn't enough. Then it complained that it absolutely required UEFI. Then the installer just failed with a cryptic error showing a line of Golang code.
I had a related episode at work when my coworker asked me why his seemingly trivial 10 line piece of code was misbehaving inexplicably. It turned out he had two variables `file_name` and `filename` and used one in place of another. I asked him how he ended up with such code, he said he used copilot to create it. Using code from a generative AI without understanding what it does is never a good idea.
We hired a new guy at work. In one of his first tasks he had chosen to write some bash, and it was pure nonsense. I mean it contained things like:
if [ -z "${Var}+x" ]
Where I can see what the author was trying to do, but the code is just wrong.
I dont mind people not knowing stuff, especially when it's essentially Bash trivia. But what broke my heart was when I pointed out the problem, linked to the documentation, but recieved the response "I dont know what it means, I just used copilot" followed by him just removing the code.
I agree that it's a waste of a learning opportunity, but from my experience it is still often rational.
There were many times in my career when I had what I expected to be a one-off issue that I needed a quick solution for and I would look for a quick and simple fix with a tool I'm unfamiliar with. I'd say that 70% of the time the thing "just works" well enough after testing, 10% of the time it doesn't quite work but I feel it's a promising approach and I'm motivated to learn more in order to get it to work, and in the remaining 20% of the time I discover that it's just significantly more complex than I thought it would be, and prefer to abandon the approach in favor of something else; I never regretted the latter.
I obviously lose a lot of learning opportunities this way, but I'm also sure I saved myself from going down many very deep rabbit holes. For example, I accepted that I'm not going to try and master sed&awk - if I see it doesn't work with a simple invocation, I drop into Python.
I feel similarly that some such learning opportunities are just going to be larger rabbit holes than the thing is worth, but in those cases I'll just prefer to do it a different way that I do know or is worth learning.
E.g. maybe it would be very 'elegant' or rather concise awk if I could overcome the learning opportunity, but like you I would probably decide not to; I'll do it with the sed I do know even if it means some additional piping and cutting or grepping or whatever that awk could've done in one, because I already know it and it's going to be clearer to me and probably anyone else I'm working with.
I think we're saying quite similar things, but my point is I wouldn't be deleting it, dismissing the idea, and disappointing the colleague ready to teach me about it - because I never would've been willing to blindly try broken AI generated (or however sourced) code that I didn't understand in the first place.
You don't have to master it. But some things are just well worth learning at least the basics of: Your programming language of choice, your shell, your editor, your version control system.
An afternoon learning your way around bash or vim will save you countless of hours of work, just because you will know what the building blocks are, you will be able to ask the right questions directly instead of chasing down alleys.
It's not the same thing as learning yet another language. It's a separate type of tool. Developing software without knowing an editor or a shell is like refusing to learn what a file is or what an ip address is. Sure, you can probably get work done in roundabout ways, but it's certainly not rational.
I like learning a little about a lot of things as I’m the entrepreneurial type, but I’m really good at very few things. I appreciate having workers who specialize in one or two things really deeply though.
Wait until a manager who's evaluating a technical decision you're making copies and pastes ChatGPT's "analysis" of your proposal and asks you to respond to it.
I don't have hiring privileges. Either way. I like the guy, and I'd rather work to build him up. That doesn't mean it's not frustrating, but I have a process that seems to build a pretty good culture.
It is not nonsense. You use that expression if you want to check if a variable exists or not (as opposed to being set to an empty string) which is an extremely common problem.
That what I meant by "I can see what they were trying to do". It's would have been correct if the "+x" was inside the braces, even in context. He did in fact want to check if the variable was unset and error out, and that's what hurts so much.
There was a real and correct analysis that, hey, I want to make sure the variable is set here. Only to then drop it because you get told the syntax is wrong. The response I'm looking for when I say "This syntax won't do what you're looking to do" would be something like "what am I trying to do? and why wont this do it?" not "well it's just some AI code, I'll just remove it".
And any decent IDE will highlight a variable that is declared but unused. We already have "artificial intelligence" in the form of IDEs, linters, compilers, etc. but some people apparently think we should just throw it all away now that we have LLMs.
I knew a guy that made a good living as a freelance web developer decades ago. He would pretty much just copy and paste code from tutorials or stack overflow and had no real idea how anything worked. Using code without understanding it is never a good idea, it doesn’t need to be from AI for that to be true.
Or maybe you’re just exaggerating. I’ve done my fair share of copy pasting and it never worked to just do it without understanding what’s going on.
I think the problem with “AI” code is that many people have almost a religions belief. There’re weirdos on internet who say that AGI is couple years away. And by extension current AI models are seen as something incapable of making a mistake when writing code.
The other downside to AI code vs stackoverflow is that a stackoverflow post can be updated, or a helpful reply might point out the error. With the advent of LLMs we may be losing this communal element of learning and knowledge-sharing.
We aren't. LLMs may have been useful for a moment in time, before the trick "it's now MY OWN creation, no IP strings attached - when it comes through the plagiarism machine" became apparent, and before the models started eating their own tail. Now they're just spiralling down, and it will IMNSHO take something else than an iterative "a future version will surely fix this, One Day, have faith."
- Which might be a different matter: of specifically SE declining. (A very different, and long-running, tragedy, but one that began long before the current AI boom and prompted by very different, non-technical issues.)
- That said, surely traffic will decline for Q&A sites. "How do I connect tab A into slot B" is something that people are likely to query LLMs for; the response will surely sound authoritative, and could be even correct. That's definitely a task where LLMs could help: common questions that have been asked many times (and as such, are likely to be well-answered in the human-made training data). A 20001st question of "how do I right-align a paragraph in HTML" has not been posted? Good. Rote tasks are well-suited to automation. (Which, again, brings us back to the issue "how to distinguish the response quality?")
But what happens with the next generation of questions? The reason LLMs can answer how to right-align a paragraph in HTML is at least in part because it has been asked and answered publicly so many times.
Now imagine that HTMZ comes along and people just go straight to asking how to full justify text in HTMZ for their smart bucket. What happens? I doubt we’ll get good answers.
It feels like the test of whether LLMs can stay useful is actually whether we can stop them from hallucinating API endpoints. If we could feed the rules of a language or API into the LLM and have it actually reason from that to code, then my posed problem would be solved. But I don’t think that’s how they fundamentally work.
>Now imagine that HTMZ comes along and people just go straight to asking how to full justify text in HTMZ for their smart bucket. What happens? I doubt we’ll get good answers.
So, I think the answer is that since all useful data is already in a LLM somewhere all new data will be stolen/scraped and inserted in real time. So if real people are answering the question it will work as normal. The real question is what happens when people are trying to mine karma by answering questions using an LLM that is hallucinating. We have seen such with the Bug Bounty silliness going on.
I upvoted your comment because I'm afraid you may be correct. I say, "afraid" because I can remember the day when a member of my team was fired for copy pasta from SO with little, if any understanding, into "production" code.
The problem, of course, is that this might work once in a while for low hanging fruit, until the web inherited things like DICOM and we now have medical imaging in the web browser (I've heard in Apple Vision Pro), where robotics implies the price of unforeseen bugs is not accidental death or dismemberment of one patient, but potentially many.
I knew someone similar. They would just get free templates and sell them as a website to customers, with almost no changes, aside from logos and text. Most had no Javascript or css and looked terrible, even by 2005 standards.
His clients were usually older small business owners that just wanted a web presence. His rate was $5000/site.
Within a few years, business dried up and he had to do something completely different.
He also hosted his own smtp server for clients.It was an old server on his cable modem in a dusty garage. I helped him prevent spoofing/relaying a few times, but he kept tinkering with the settings and it would happen all over again.
It certainly puts a ceiling on a career. And I'd argue it probably gave him a pretty rough shelf life. At some point he has to understand what he's doing.
Unless he's so good at selling his services he can consistently find new clients. And if that's the case, he'd probably kill it in sales.
Sales engineers have to be good enough to bluff their way through the layers of hyperbole/minor exaggeration/utter bullshit (delete as applicable) the sales team have spun. Whether their conscience gets involved before the deal closes, different question.
Not at my work. Around here sales engineers just say "this is a proof of concept, X will be different in the final version". Then, after they close the deal, they give us their half implemented feature they developed that none of us heard about before, and tell ys that we need to finish it and include it in the next release.
He may have made a good living, but his customer / employer bought low quality code with lots of tech debt.
That business model only works until customers are sophisticated enough to understand tech debt. In the future, more customers will be less willing to pay the same good wages for low quality code.
Yeah, and the business people could not care less. I am on a team taking in millions of dollars from a Delphi Windows app from 1997. Zero tests, horribly mangled business logic embedded in UI handlers. Maintaining that app is not feasible. I'm rebuilding a modern version of it only because it is embarrassing to demo and is such a UX nightmare that our distributor made us commit to a new app.
There are plumbers who make a living but whose work results in leaks in people's homes. They're making a living, but I don't consider the way they work "a good idea".
That's fair. From a personal perspective it was a good idea. He regularly had sites get compromised though, so for his customers it wasn't always a good product. He generally kept his customers happy though.
Claude gave me something similar, except these were both used, and somehow global variables, and it got confused about when to use which one.
Asking it to refactor / fix it made it worse bc it'd get confused, and merge them into a single variable — the problem was they had slightly different uses, which broke everything
I had to step through the code line by line to fix it.
Using Claude's still faster for me, as it'd probably take a week for me to write the code in the first place.
BUT there's a lot of traps like this hidden everywhere probably, and those will rear their ugly heads at some point. Wish there was a good test generation tool to go with the code generation tool...
One thing I've found in doing a lot of coding with LLMs is that you're often better off updating the initial prompt and starting fresh rather than asking for fixes.
Having mistakes in context seems to 'contaminate' the results and you keep getting more problems even when you're specifically asking for a fix.
It does make some sense as LLMs are generally known to respond much better to positive examples than negative examples. If an LLM sees the wrong way, it can't help being influenced by it, even if your prompt says very sternly not to do it that way. So you're usually better off re-framing what you want in positive terms.
As someone who uses LLMs on my hobby projects to write code, I’ve found the opposite. I usually fix the code, then send it in saying it is a refactor to clarify things. It seems to work well enough. If it is rather complex, I will paste the broken code into another conversation and ask it to refactor/explain what is going on.
Fixing the mistake yourself and then sending the code back is a positive example, since you're demonstrating the correct way rather than asking for a fix.
But in my experience, if you continue iterating from that point, there's still a risk that parts of the original broken code can leak back into the output again later on since the broken code is still in context.
Ymmv of course and it definitely depends a lot on the complexity of what you're doing.
I’m attempting to keep the context ball rolling by reiterating key points of a request throughout the conversation.
The challenge is writing in a tone that will gently move the conversation rather than refocus it. I can’t just inject “remember point n+1” and hope that’s not all it’ll talk about in the next frame.
If nothing else, LLMs have helped me understand exactly why GIGO is a fundamental law.
LSS: metaprogramming tests is not trivial but straightforward, given that you can see the code, the AST, and associated metadata, such as generating test input. I've done it myself, more than a decade ago.
I've referred to this as a mix of literate programming (noting the traps you referred to and the anachronistic quality of them relative to both the generated tests and their generated tested code) wrapped up in human-computer sensemaking given the fact that what the AI sees is often at best a lack in its symbolic representation that is imaginary, not real; thus, requiring iterative correction to hit its user's target, just like a real test team interacting with a dev team.
In my estimation, it's actually harder to explain than it is to do.
At least for me stupid bugs like this turn out to be some of the most time wasting to debug, no AI involved. Like accidentally have something quoted somewhere, or add an 's' to a variable by accident and I may not even correctly process what the error message is reporting at first. Always feel a bit silly after.
These kinds of problems are often avoidable by linters or asking ChatGPT what is wrong, though I was just tearing my hair wondering why TSC_COMPILE_ERROR wasn't skipping TypeScript because I spelled it TSX_COMPILE_ERROR in my environment variable.
Not only asking ChatGPT what is wrong, but also using an agent which does self-reflection by default. I'm sad every time I see people using the bare chat interface to generate code. We've got API tools which are so much better at it today. Use Aider at the very least.
does aider have an executable installer yet? i tried installing it but the python experience is terrible. last time i messed with python installs on my mac everything worked like shit until o reinstalled the OS.
(https://aider.chat/docs/install/pipx.html will install it globally on your system within its own python environment. This way you can use aider to work on any python project, even if that project has conflicting dependencies.)
I run it with these settings:
`aider --sonnet --no-auto-commits--cache-prompts`
if you want to run the most bleeding edge version at the same time without breaking anything,
This type of bug is trivial for GPT to fix though. It was born for this. Sometimes it does generate real footguns but this sounds like an example from an earlier generation of generative AI.
> It's easy to miss that kind of difference even if you wrote the code yourself.
The developer in the story had no idea what the code did, hence they would not have written it themselves, making it impossible for them to “miss” anything.
Interestingly, a great application for GenAI is to copy and paste code and ask it, "Why is this not working?". It works even better if you give it the specific error you are getting (and it is a well understood system).
> Using code from a generative AI without understanding what it does is never a good idea.
Hear hear!
I feel like genAI is turning devs from authors to editors. Anyone who thinks the latter is lesser than the former has not performed both functions. Editing properly, to elevate the meaning of the author, is a worthy and difficult endeavor.
It's not the language, it's the IDE and laziness. They're doing this in notepad or maybe VSCode and don't have anything configured for highlighting or checking. Heck they probably don't even know how to interpret the error message saying "file_name is not declared".
I'm the first to bash JS, but this is not a JS issue. It's 100% a "bad and lazy" human actor that is throwing spaghetti on the wall to see what sticks. In this case, they have a minigun cannon called ChatGPT/CoPilot that is letting them do more of it than what they used to.
Many statically typed languages do have escape hatches to do some dynamic typing at runtime, but this is not the default (hence the classification), and it requires some additional effort to use.
There's more than one mistake in the code given, and I tried just once.
"Create Python code for iterating a file with a cache."
class CachedFileIterator:
def __init__(self, file_path):
self.file_path = file_path
self.cache = [] # List to store cashed lines
self.current_index = 0 # Index for the current line to read
def _read_lines_from_file(self):
"""Read lines from the file and cache them."""
with open(self.file_path, 'r') as openFile:
for line in f:
self.cache.append(line.strip()) # Strip newline characters and save to cache
def __iter__(self):
"""Return the iterator object itself."""
return self
def __next__(self):
"""Return the next line in the iteration."""
if self.current_index < len(self.cache):
result = self.cache[self.current_index]
self.current_index += 1
return result
else:
# If we reach the end of the cache, we read more lines from the file if necessary
if len(self.cache) == 0: # Cache is empty, read file
self._read_lines_from_file()
if self.current_index < len(self.cache):
result = self.cache[self.current_index]
self.current_index += 1
return result
else:
# If still no lines, we stop iteration
raise StopIteration
# Example usage:
if __name__ == "__main__":
file_path = 'example.txt' # Replace with your file path
iterator = CacheingFileIterator(file_path)
for line in iterator:
print(line)
Garbage code is bad enough, but it's not like people have never had to walk juniors through mistakes before LLMs.
But this is actually so much worse for that same reason - the type of developer who'd submit Copilot output (I can call it that, as it's definitely not code) for a PR is unable to respond to any comment beyond asking Copilot again and wasting everyone's time with 6 more rounds of reviews. I've literally had to write test cases for someone else and told them "You can't ask for another code review until your code passes these."
Bit of a tangent, though related. It looks like you accidentally stumbled into a version of test driven development ;)
With the big difference obviously being that typically the developer who writes the test also will write the code.
In some situations, this actually makes sense to do with junior developers as part of their training. Where a senior developer sits down with them and write out the tests together, then with the tests as a guide they are thrown into the waters to develop the functionality.
Of course, I suspect that in this case, you were not dealing with a junior. Rather the sort of person who looks at your tests, still is confused and asks for a "quick call" to talk about the tests.
What do you see as mistakes? I see some weirdness, but the spec is just not complete - there was no requirement for rewinding, multiple users, etc. in the request so it's not implemented.
The only thing I'd call an actual mistake is using an empty list to mean both an empty file and an uninitialised value.
The file object is named "openFile", but used as "f". The class is defined as "CachedFileIterator", but used as "CacheingFileIterator". That's two typos, before discussing the actual code.
Well, there's also the fact that the entire thing could be replaced with...
def cached_file_iterator(file_path):
with open(file_path, 'r') as f:
lines = [ line.strip() for line in f.readlines() ]
yield from iter(lines)
# Example usage:
if __name__ == "__main__":
file_path = 'example.txt' # Replace with your file path
iterator = cached_file_iterator(file_path)
for line in iterator:
print(line)
Which is functionally identical and FAR less code to maintain.
iterating over the file object at all instead of just calling self.cache = openFile.readlines() means that calling strip() the line below removes data beyond just the trailing newlines.
One is that the variable is called openFile and not f. I don't know enough python to see something else wrong with that but would love to know too, since I've written such a line just last week.
Oh for crying out loud, I obviously mean these specific mistakes. If you have worked in any capacity with LLMs like this you would have seen them variables or suddenly switch up the convention of how they're written.
Certainly if you are in a conversation mode after a few back and forths this happens from time to time.
I am just not going to spend my time digging to previous prompts of code I might not want to share just to satisfy a random internet person .
The models I've used don't make typos on variable names that already exist in the context. Typos are not the failure mode, this is literally the easiest text prediction task they can do.
What you guys probably want to do instead is get to a common definition of what a typo is. Personally, I understand it as a typographic error, which is a fancy way of saying a spelling mistake (a mistake on a letter), not a mistake where one use a word for another.
Not the OP. I have certainly seen LLM coding tools generate blocks of code with misspelled variables and typos. Trying to shove someone into a box of being a cynic because they have had bad personal experiences with tools is a good way to ensure people filter out your opinions.
would someone invent that and bother the author with that? I mean I suppose it's possible, but that seems like such a waste of time to me that I find that more unlikely. and while it's a typo, it's not fleinaem or something that's totally wrong, just a choice in breaking up the word filename. having written file handling code, the various permutations of filename and path and dirname get to be a bit much sometimes.
You are getting downvoted but you are right, a typo in a variable that already exists in a file like this is not the failure mode for LLMs. The failure mode is logic bugs, making up methods / functions.
I've been using copilot for as long as it has existed and what you are describing has not happened to me once. Literally on in the background 8 hours a day. Excuse me for not trusting the internet hivemind that hates everything that is hyped just a little bit.
My goto check of AI assistants is asking to write a function calculating the first N digits of Pi in Common Lisp. On at least two attempts when prompted to fix its code the model would change one of the variable names to T, which is a reserved symbol. So yeah pretty sure it does happen.
Software products these days are being written quick and dirty to be released as fast as possible to capture the market. Nobody has time to create quality software because by the time you finish your product, your users will already be using your competitor's product which may be horribly slow and buggy and a memory hog, but it was released a year earlier than your product. So there you go. "Release fast, improve later" is the motto of the software companies today.