Does your work not depend on existing code bases, product architectures and nontrivial domain contexts the LLM knows nothing about?
Every thread like this over the past year or so has had comments similar to yours, and it always remains quite vague, or when examples are given, it’s about self-contained tasks that require little contextual knowledge and are confined to widely publicly-documented technologies.
Context is the most challenging bit. FWIW, the codebases I'm working on are still small enough to where I rarely need to include more than 12 files into context. And I find as I make the context bigger beyond that, results degrade significantly.
So I don't know how this would go in a much larger codebase.
What floored him was simply how much of my programming I was doing with an LLM / how little I write line-by-line (vs edit line-by-line).
If you're really curious, I recorded some work for a friend. The first video has terrible audio, unfortunately. This second one I think gives a very realistic demonstration – you'll see the model struggle a bit at the beginning:
I know that you are getting some push-back because of your exuberance regarding your use of LLMs in development, but let me just say I respect that when someone told you to "put up or shut up" you did. Good on you!
So you spend 10 minutes writing a free text description of the test you want; tell it exactly how you want it to write the test, and then 4-5 minutes trying to understand if it did the right thing or not, restart because it did something crazy then a few minutes manually fixing the diff it generated?
MMmm.
I mean, don't get me wrong; this is impressive stuff; but it needs to be an order of magnitude less 'screwing around trying to fix the random crap' for this to be 'wow, amazing!' rather than a technical demonstration.
You could have done this more quickly without using AI.
I have no doubt this is transformative technology, but people using it are choosing to use it; it's not actually better than not using it at this point, as far as I can tell.
Stoked you watched, thanks. (Sorry the example isn't the greatest/lacks context. The first video was better, but the mic gain was too high.)
You summed up the workflow accurately. Except, I read your first paragraph in a positive light, while I imagine you meant it to be negative.
Note the feedback loop you described is the same one as me delegating requirements to someone else (i.e. s/LLM/jr eng). And then reading/editing their PR. Except the feedback loop is, obviously, much tighter.
I've written a lot of tests, I think this would have taken 3-4x longer to do by hand. Surely an hour?
But even if all things were roughly equal, I like being in the navigator seat vs the driver seat. Editor vs writer. It helps me keep the big picture in mind, focused on requirements and architecture, not line-wise implementation details.
> I've written a lot of tests, I think this would have taken 3-4x longer to do by hand. Surely an hour?
It seems to me that the first few are almost complete copy-paste of older tests. You would have got code closer to the final test in the update case with simple copy-paste than what was provided.
The real value is only in the filtered test to choose randomly (btw, I have no idea why that’s beneficial here), and the one which checks that both consumers got the same info. They can be done in a few minutes with the help of the already made insert test, and the original version of the filtered test.
I’m happy that more people can code with this, and it’s great that it makes your coding faster. It makes coding more accessible. However, there are a lot of people who can do this faster without AI, so it’s definitely not for everybody yet.
> I've written a lot of tests, I think this would have taken 3-4x longer to do by hand. Surely an hour?
I guess my point is I'm skeptical.
I don't believe what you had the end would have taken you that long to do by hand. I don't believe it would have taken an hour. It certainly would not have taken me or anyone on my team that long.
I feel like you're projecting that, if you scale this process, so say, having 5 LLMs running in parallel, then what you would get is you spending maybe 20% more time reviewing 5x PRs instead of 1x PR, but getting 5x as much stuff done in the end.
Which may be true.
...but, and this is really my point: It's not true, in this example. It's not true in any examples I've seen.
It feels like it might be true in the near-moderate future, but there are a lot of underlying assumptions that is based on:
- LLMs get faster (probably)
- LLMs get more accurate and less prone to errors (???)
- LLMs get more context size without going crazy (???)
- The marginal cost of doing N x code reviews is < the cost of just writing code N times (???)
These are assumptions that... well, who knows? Maybe? ...but right now? Like, today?
The problem is: If it was actually making people more productive then we would see evidence of it. Like, actual concrete examples of people having 10 LLMs building systems for them.
...but what we do see, is people doing things like this, which seem like (to me at least), either worse or on-par with just doing the same work by hand.
A different workflow, certainly; but not obviously better.
LLMs appear to have an immediate right now disruptive impact on particular domains, like, say, learning, where its extremely clear that having a wise coding assistant to help you gain simple cross domain knowledge is highly impactful (look at stack overflow); but despite all the hand waving and all the people talking about it, the actual concrete evidence of a 'Devin' that actually builds software or even meaningfully improves programmer productivity (not 'is a tool that gives some marginal benefit to existing autocomplete'; actually improves productivity) is ...
...simply absent.
I find that problematic, and it makes me skeptical of grand claims.
Grand claims require concrete tangible evidence.
I've no doubt that you've got a workflow that works for you, and thanks for sharing it. :) ...I just don't think its really compelling, currently, to work that way for most people; I don't think you can reasonably argue it's more productive, or more effective, based on what I've actually seen.
I used it for some troubleshooting in my job. Linux Sys Admin. I like how I can just ask it a question and explain the situation and it goes thru everything with me, like a Person.
How worried are you that it's giving you bad advice?
There are plenty of, "Just disable certificate checking" type answers on Stack Overflow, but there are also a lot of comments calling them out. How do you fact check the AI? Is it just a shortcut to finding better documentation?
In my opinion it’s better at filtering down my convoluted explanation into some troubleshooting steps I can take, to investigate. It’s kind of like an evolved Google algorithm, boiling down the internet’s knowledge. And I’ve had it give me step by step instructions on “esoteric” things like dwm config file examples, plugins for displaying pictures in terminal, what files to edit and where…it’s kind of efficient I think. Better than browsing ads. Lol.
I think that Greptile is on the right track. I made a repo containing the c# source code for the godot game engine, and it's "how to do X", where X is some obscure technical feature (like how to create a collision query using the godot internal physics api) is much better than all the other ai solutions which use general training data.
However there are some very frustrating limitations to greptle, so severe that I basically only use it to ask implementation questions on existing codebases, not for anything like general R&D:
1) answers are limited to about 150 lines.
2) it doesn't re-analyze a repo after you link it in a conversation (you need to start a new conversation, and re-link the repo, then wait 20+ min for it to parse your code)
3) it is very slow (maybe 30 seconds to answer a question)
4) there's no prompt engineering
I think it's a bit strange that no other ai solution lets you ask questions about existing codebases. I hope that will be more widespread soon.
I work at Greptile and agree on all three criticisms. 1) is a bug we haven't been able to fix, 2) has to do with the high cost of re-indexing, we will likely start auto-updating the index when LLM costs come down a little, and 3) has to do with LLM speed. We pushed some changes to cut time-to-first-token by about half but long way to go.
Re: prompt engineering, we have a prompt guide if that helps, was that what you are getting at?
No idea about the product, but I would like to congratulate you guys on what is maybe the greatest name ever. Something about it seems to combine "fierce" with "cute", so I think you should consider changing your logo to something that looks like reptar
Comments should get more thoughtful and substantive,
not less, as a topic gets more divisive.
Eschew flamebait. Avoid generic tangents. Omit internet
tropes.
Be kind. Don't be snarky. Converse curiously;
don't cross-examine. Edit out swipes.
Please don't comment about the voting on comments.
It never does any good, and it makes boring reading.
Not-so-subtly mocking the top-level for not replying "yet", when they replied almost immediately after with a video of the relevant workflow, was not a move that made you look smart or nice.
>when they replied almost immediately after with a video of the relevant workflow
Wow. Such wrong claims.
I had already replied to you in a sibling comment, refuting your points, but will give one more proof (not that I really need to):
_acco, the top level commenter relevant to this discussion, commented at some time, say x.
layer8 commented, replying to _acco, 7 hours ago (as can be seen on the page at the time of my writing this comment, i.e. right now).
I then replied to layer8, 6 hours ago.
_acco replied back to layer8 5 hours ago.
All this is visible right now on the page; and if people check it a few hours later, the relative time deltas will remain the same, obviously. (But not if they check after 24 hours, in which case all comments will show as one day ago.)
So there was a 1 hour gap between layer8's comment and mine, and a 2 hour gap between layer8's comment and _acco's reply.
If you think 2 hours is the same as "almost immediately", as you said above, I have nothing more to say to you, except that our perceptions of time are highly different.
I meant immediately after your reply. At the time I posted, your and acco_'s replies to layer8 both showed as "3 hours" ago. Now they both show as "13 hours ago". Really, I'm being generous in assuming they didn't reply before you.
Ed: ah, since the time I wrote this comment, your respective comments are now at 14 and 13 hours. Congrats on your <1hr lead.
I was careful to check the spelling before mentioning their name, unlike you, even when I referred to them earlier. The fact that you cannot even get the position of an underscore in a name, correct, seems to indicate that you are sloppy. which leads me to my next point.
2. pompous:
You said:
>Really, I'm being generous in assuming they didn't reply before you.
This is the pompous bit. Generous? Laughable.
I neither need not want your generosity. If anything, I prefer objectivity, and that people give others the benefit of the doubt, instead of assuming bad intentions by them: I had actually checked for a second comment by _acco (in reply to layer8) just before I wrote my comment to layer8, the one that got all of you in a tizzy. But you not only got the times wrong (see your edit, and point 3 below), but also assumed bad faith on my part.
3. fake.
You first said above that both those replies to layer8 showed as 13 hours ago, then edited your comment to say 14 and 13 hours. It shows that you don't use your brains. The feature of software showing time deltas in the form of "hours ago" or "days ago", versus an exact time stamp, is pretty old by now. It dates back to Web 2.0 or earlier, maybe it was started by some Web 2 startups or by Google.
If you think you are so clever as to criticize me without proof, or say that you are generous in your assumptions about me, you should have been equally clever or generous about the time delta point above, and so realized that I could have replied to layer8 before _acco, which was indeed the case. Obviously I cannot prove it, but the fact that I got _acco's name correct, while you did not, lends credence to my statement. It shows that I took care while writing my comment.
4. So you are fake because you don't bother to think before bad-mouthing others, and even more fake because you did not apply (to yourself) your own made-up "rule" in this other comment of yours, where you criticized my comment as being neither smart nor nice, so not of value:
I should not have had to write such a long comment to refute your silly and false allegations, and I will not always do that, but decided to do it this time, to make the point.
And, wow: you managed to pack 3 blunders (being inaccurate, pompous and fake) into a comment of just a few lines. That's neither smart not nice. Instead, it's toxic.
Actually, your inaccurateness (inaccuracy? GIYF) is even worse than I said above. My comment of a few levels above, literally uses the name _acco at least four times - I checked just now. And your comment was in reply to that. So even after reading that person's name four times in my comment, you still got its spelling wrong. Congrats. (Yeah, I can snark too, like you did to me upthread.)
You seem to have gotten so worked up over my misplaced underscore that you yourself forgot how those ubiquitous rounded timestamps work. When I first wrote my comments, they were indeed the same, 3 and later 13 hours. After I wrote my later comment, in the few minutes between times I looked at it, the timestamp on yours just happened to cross the threshold where it rounded up to 14 instead of down to 13. (And if I was "sloppy", do you really think I would have looked again and corrected my comment?) Presumably if I looked at them a bit later they would have both said 14 hours. Hence <1 hour lead.
Anyway, yeah, I worry less about being nice to people who've already shown themselves to be clowns, in a sub thread that's flagged to death. You got me there. FWIW I was originally hoping to enlighten you a bit as to why you were being downvoted, as a small help to you.
These are the four simple lines that I wrote above:
>Solid questions and comments, layer8.
>I notice that the person you replied to has not replied to you yet.
>It may be that they will, of course.
>But your points are good.
(Italics mine, and they were not in my original comment.)
You, above:
>when they replied almost immediately after with a video of the relevant workflow,
I did check the time intervals between the top level comment and layer8's comment, before my first reply. It is over an hour now, so I cannot see the exact times in minutes any more, but IIRC, there was a fairly long gap (in minutes). And I also think I noticed that the top level person did reply to someone else, but not to layer8, by the time I wrote my comment.
So I don't see anything wrong in what I said. I even said that they may reply later.
You consider that to be:
>"Not-so-subtly mocking"?
Jeez. I think you are wrong.
Then I have nothing further to say to you, except this:
>was not a move that made you look smart or nice.
Trying to look smart or nice is not the goal in online discussions. At least, I don't think so. You appear to think that. The goal (to me) is to say what you think, otherwise, why write at all?
I could just get an LLM to write all my comments, and not care about its hallucinations.
I don't try to be smart or nice, nor the reverse. I just put my considered thoughts out there, just like anyone else. Obviously I can be right or wrong, just like anyone else can be. And some points can be subjective, so cannot be said to be definitely either right or wrong.
If a comment is not at least one of smart or nice, it's a waste of space and attention. That may not be your purpose, but don't act shocked when people respond with negativity.
Every thread like this over the past year or so has had comments similar to yours, and it always remains quite vague, or when examples are given, it’s about self-contained tasks that require little contextual knowledge and are confined to widely publicly-documented technologies.
What exactly floored your colleague at Microsoft?