Hacker Newsnew | past | comments | ask | show | jobs | submit | yodsanklai's commentslogin

I wonder if it means Meta will move away from their OSS commitment. Wasn't it largely pushed by LeCun?

Yes, it was

> he failed pretty bad at the LLM race

Was he even involved in this?


Did they even fail? Llama2 was groundbreaking for open source LLMs, it defined the entire space. Llama3 was a major improvement over Llama2. Just because Llama4 was underwhelming, it's silly to say they failed.

Any exponential growth is failing in a market which demands superexponential growth

No, he said that he was not involved. He had his own research model to develop, his startup will probably continue his work there but I wonder if he thinks its viable in the short term since he's launching a startup. I thought it was a moonshot.

> is it about doing, or is it about getting things done?

It's both. When you climb a mountain, the joy is reaching the summit after the hard hike. The hike is hard but also enjoyable in itself, and makes you appreciate reaching the top even more.

If there's a cable car or a road leading to the summit, the view may still be nice, but I'll go hiking somewhere else.


Pretty much my experience, LLMs have taken the fun out of programming for me. My coding sessions are:

1. write prompt

2. slack a few minutes

3. go to 1

4. send code for review

I know what the code is doing, how I want it to look eventually, and my commits are small and self-contained, but I don't understand my code as much because I didn't spend so much time manipulate it. Often I spend more time in my loops than if I was writing the code myself.

I'm sure that with the right discipline, it's possible to tame the LLM, but I've not been able to reach that stage yet.


I’ve stopped getting LLM to code and use it to spitball ideas, solutions etc to the issue.

This lets you get a solution plan done, with all the files and then you get to write the code.

Where I do let it code is in tests.

I write a first “good” passing test then ask it to create all the others bad input etc. saves a bunch of time and it can copy and paste faster then I can.


I'm experimenting with how to code w/ LLMs. I used an AI assistant for about a month w/ a React app, prompting it to do this & that, and I learned almost nothing in that month about React itself. Then I prompted it to tell me what to do, but I did the typing, and I learned quite a bit in a short period of time.

Why are you doing it? Direction from management? You think it's better code even though it's as you say less fun, and you're not sure if faster or not? Other?

At a minimum I write my own automated tests for LLM code (including browser automation) and think them through carefully. That always exposes some limitations to Claude's solutions, discovers errors, and lets you revisit it so you fully understand what you're generating.

Mostly LLMs do the first pass and I rewrite a lot of it with a much better higher level systems approach and "will the other devs on the team understand / reuse this".

I'd still prefer deciphering a lot of default overly-verbose LLM code to some of the crazy stuff that past devs have created by trying to be clever.


Have you tried Composer 1 from Cursor? It enables a totally different way of AI coding - instead of giving the LLM a long prompt and waiting minutes for it to finish, you give it a shorter prompt to just write one small thing and it finishes in seconds. There’s no interruption, you stay in the flow, and in control of what you’re building.

I just don't get these comments about syntax...

Just taking the first example I can find of some auto-formatted OCaml code

https://github.com/janestreet/core/blob/master/command/src/c...

It doesn't look more a soup of words than any other language. Not sure what's hard to parse for humans.


I personally find OCaml more pragmatic than Haskell.

Haskell has a steeper learning curve IMHO: monads are pervasive and are hard to understand, laziness isn't a common programming pattern and it adds complexity. I find type classes confusing as well, it's not always clear where things are defined.

I like that OCaml is close to the hardware, there are no complex abstractions. The module system makes it easy to program in the large (I love mli). If you avoid the more advanced features, it's a super simple language.


I should have specified. I wasn't asking about OCaml vs Haskell in general[0], but what advantage does OCaml have with respect to concurrency?

[0] I think most people just end up post-rationalizing whatever choice they have already invested in, I know I do :) With that in mind, maybe I as a mainly-Haskell dev should instead list some things I miss from OCaml: faster compile times, non-recursive `let` by default, polymorphic variants, labeled arguments


Locating: For my last job, got contacted out of the blue by a couple of big tech companies (I was recommended by an acquaintance for one for them).

Interviewing: lots and lots of practice. Took me 3 attempts to pass, 3 consecutive years.

Surviving: one half at a time. It's hard at times but not that bad that I want to resign.


> I got fed up of the overall pro BigTech sentiment.

Not saying this isn't true, but maybe this is your own bias that makes you think that way.


> if you're building a startup and you pick OCaml, you've just cut your hiring pool by 95%. that's way more painful than learning a different way to write functions.

You can hire anyone who already understands pattern matching, closures, map/fold (these are more and more common constructs nowadays) and train them to learn OCaml. It's a simple language overall, especially if your codebase doesn't use any complicated features.


You review it like it wasn't AI generated. That is: ask author to split it in reviewable blocks. Or if you don't have an obligation to review it, you leave it there.

This is it. The fact that the PR was vibe coded isn't the problem, and doesn't need to influence the way you handle it.

It would be willfully ignorant to pretend that there's not an explosion of a novel and specific kind of stupidity, and to not handle it with due specificity.

> It would be willfully ignorant to pretend that there's not an explosion of a novel and specific kind of stupidity

I 100% know what you mean, and largely agree, but you should check out the guidelines, specifically:

> Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative.

And like, the problem _is_ *bad*. A fun, on-going issue at work is trying to coordinate with a QA team who believe chatgpt can write css selectors for HTML elements that are not yet written.

That same QA team deeply care about the spirit of their work, and are motivated by, the _very_ relatable sentiment of, you DONT FUCKING BREAK USER SPACE.

Yeah, in the unbridled, chaotic, raging plasma that is our zeitgeist at the moment, I'm lucky enough to have people dedicating a significant portion of their life to trying to do quality assurance in the idiomatic, industry best-standard way. Blame the FUD, not my team.

I would put to you that the observation that they do not (yet) grok what, for lack of a more specific universally understood term we are calling, "AI" (or LLMs if you are Fancy. But of course none of these labels are quite right). People need time to observe, and learn. And people are busy with /* gestures around vaguely at everything /*.

So yes, we should acknowledge that long-winded trash PRs from AI are a new emergent problem, and yes, if we study the specific problem more closely we will almost certainly find ever more optimal approaches.

Writing off the issue as "stupidity" is mean. In both senses.


I do not think that is being curmudgeonly. Instead, OP is absolutely right.

We collectively used the strategy of "we pretend we are naively stupid and dont talk directly about issues" in multiple areas ... and it failed every single time in all of them. It never solves the problem, it just invites to bad/lazy/whatever actors to play semantic manipulative games.


I contend that, by far and away the biggest difference between entirely human-generated slop and AI-assisted stupidity is the irrational reaction that some people have to AI-assisted stuff.

Many of the people who submit 9000-line AI-generated PRs today would, for the most part, not have submitted PRs at all before, or would not have made something that passes CI, or would not have built something that looks sufficiently plausible to make people spend time reviewing it.

Most of those people should still keep their ignorance to themselves, without bothering actual programmers, like they did before LLM hype convinced them that "sufficiently plausible" is good enough.

A similar trend: the popularity of electric scooters among youngsters who would otherwise walk, use public transport, or use decent vehicles increases accidents in cities.


I think my comment may have been misparsed. I was observing that one of the problems with LLMs is making it possible for people to produce 9000-line PRs they don't understand where previously they might have been gated by making something even remotely plausible that compiles or passes CI.

9000-line PRs were never a good idea, have only been sufficiently plausible because we were forced to accept bad PR review practices. Coding was expensive and management beat us into LGTMing them into the codebase to keep the features churning.

Those days are gone. Coding is cheap. The same LLMs that enable people to submit 9000 line PRs of chaos can be used to quickly turn them into more sensible work. If they genuinely can't do a better job, rejecting the PR is still the right response. Just push back.


Calling things "slop" is just begging the question. The real differentiating factor is that, in the past, "human-generated slop" at least took effort to produce. Perhaps, in the process of producing it, the human notices what's happening and reconsiders (or even better, improves it such that it's no longer "slop".) Claude has no such inhibitions. So, when you look at a big bunch of code that you haven't read yet, are you more or less confident when you find out an LLM wrote it?

If you try and one shot it, sure, but if you question Claude, point out the errors of its ways, tell it to refactor and ultrathink, point out that two things have similar functionality and could be merged. It can write unhinged code with duplicate unused variable definitions that don't work, and it'll fix it up if you call it out, or you can just do it yourself. (cue questions of if, in that case, it would just be faster to do it yourself.)

I have a Claude max subscription. When I think of bad Claude code, I'm not thinking about unused variable definitions. I'm thinking about the times you turn on ultrathink, allow it to access tools and negotiate it's solution, and it still churns out an over complicated yet partially correct solution that breaks. I totally trust Claude to fix linting errors.

It's hard to really discuss in the abstract though. Why was the generared code overly complicated? (I mean, I believe you when you say it was, but it doesn't leave much room for discussion). Similarly, what's partially correct about it? How many additional prompts does it take before you a) use it as a starting point b) use it because it works c) don't use any of it, just throw it away d) post about why it was lousy to all of the Internet reachable from your local ASN.

I've read your questions a few times and I'm a bit perplexed. What kind of answers are you expecting me to give you here? Surely if you use Claude Code or other tools you'd know that the answers are so varying and situation specific it's not really possible for me to give you solid answers.

However much you're comfortable sharing! Obviously ideal would be the full source for the "overly complicated" solution, but naturally that's a no go, so even just more words than a two word phrase "overly complicated". Was it complicated because it used 17 classes with no inheritance and 5 would have done it? Was it overly complicated because it didn't use functions and so has the same logic implemented in 5 different places?

I'm not asking you, generically, about what bad code do LLMs produce. It sounds like you used Claude Code in a specific situation and found the generated code lacking. I'm not questioning that it happened to you, I'm curious in what ways it was bad for your specific situation more specifically than "overly complicated". How was it overly complicated?

Even if you can't answer that, maybe you could help me reword the phrasing of my original comment so it's less perplexing?


If you are getting garbage out, you are asking it for too much at once. Don't ask for solutions - ask for implementations.

Distinction without a difference. I'm talking about its output being insufficient, whatever word you want to use for output.

And I'm arguing that if the output wasn't sufficient, neither was your input.

You could also be asking for too much in one go, though that's becoming less and less of a problem as LLMs improve.


You're proposing a truism: if you don't get a good result, it's either because your query is bad or because the LLM isn't good enough to provide a good result.

Yes, that is how this works. I'm talking about the case where you're providing a good query and getting poor results. Claiming that this can be solved by more LLM conversations and ultrathink is cope.


I've claimed neither. I actually prefer restarting or rolling back quickly rather than trying to re-work suboptimal outputs - less chance of being rabbit holed. Just add what I've learned to the original ticket/prompt.

'Git gud' isn't much of a truism.


I have pretty much the same amount of confidence when I receive AI generated or non-AI generated code to review: my confidence is based on the person guiding the LLM, and their ability to that.

Much more so than before, I'll comfortably reject a PR that is hard to follow, for any reason, including size. IMHO, the biggest change that LLMs have brought to the table is that clean code and refactoring are no longer expensive, and should no longer be bargained for, neglected or given the lip service that they have received throughout most of my career. Test suites and documentation, too.

(Given the nature of working with LLMs, I also suspect that clean, idiomatic code is more important than ever, since LLMs have presumably been trained on that, but this is just a personal superstition, that is probably increasingly false, but also feels harmless)

The only time I think it is appropriate to land a large amount of code at once is if it is a single act of entirely brain dead refactoring, doing nothing new, such as renaming a single variable across an entire codebase, or moving/breaking/consolidating a single module or file. And there better be tests. Otherwise, get an LLM to break things up and make things easier for me to understand, for crying out loud: there are precious few reasons left not to make reviewing PRs as easy as possible.

So, I posit that the emotional reaction from certain audiences is still the largest, most exhausting difference.


clean code and refactoring are no longer expensive

Are you contending that LLMs produce clean code?


They do, for many people. Perhaps you need to change your approach.

The code I've seen generated by others has been pretty terrible in aggregate, particularly over time as the lack of understanding and coherent thought starts to show. Quite happy without it thanks, haven't seen it adding value yet.

Or is the bad code you've seen generated by others pretty terrible, but the good code you've seen generated by others blends in as human-written?

My last major PR included a bunch of tests written completely by AI with some minor tweaking by hand, and my MR was praised with, "love this approach to testing."


If you can produce a clean design, the LLM can write the code.

I think maybe there's another step too - breaking the design up into small enough peices that the LLM can follow it, and you can understand the output.

So do all the hard work yourself and let the AI do some of the typing, that you’ll have to spend extra time reviewing closely in case its RNG factor made it change an important detail. And with all the extra up front design, planning, instructions, and context you need to provide to the LLM I’m not sure I’m saving on typing. A lot of people recommend going meta and having LLMs generate a good prompt and sequence of steps to follow, but I’ve only seen that kinda sorta work for the most trivial tasks.

Unless you're doing something fabulously unique (at which point I'm jealous you get to work on such a thing), they're pretty good at cribbing the design of things if it's something that's been well documented online (canonically, a CRUD SaaS app, with minor UI modification to support your chosen niche).

And if you are doing something fabulously unique, the LLM can still write all the code around it, likely help with many of the components, give you at least a first pass at tests, and enable rapid, meaningful refactors after each feature PR.

I don't really understand your point. It reads like you're saying "I like good code, it doesn't matter if it comes from a person or an LLM. If a person is good at using an LLM, it's fine." Sure, but the problem people have with LLMs is their _propensity_ to create slop in comparison to humans. Dismissing other people's observations as purely an emotional reaction just makes it seem like you haven't carefully thought about other people's experiences.

My point is that, if I can do it right, others can too. If someone's LLM is outputing slop, they are obviously doing something different: I'm using the same LLMs.

All the LLM hate here isn't observation, it's sour grapes. Complaining about slop and poor code quality outputs is confessing that you haven't taken the time to understand what is reasonable to ask for, aren't educating your junior engineers how to interact with LLMs.


"My point is that, if I can do it right, others can too."

Can it also be, that different people work in different areas and LLM's are not equally good in all areas?


That was my first assumption, quite a while ago now.

???

People complaining about receiving bad code is, by definition, observation.


> Perhaps, in the process of producing it, the human notices what's happening and reconsiders (or even better, improves it such that it's no longer "slop".)

Given the same ridiculously large and complex change, if it is handwritten only a seriously insensitive and arrogant crackpot could, knowing what's inside, submit it with any expectation that you accept it without a long and painful process instead of improving it to the best of their ability; on the other hand using LLM assistance even a mildly incompetent but valuable colleague or contributor, someone you care about, might underestimate the complexity and cost of what they didn't actually write and believe that there is nothing to improve.


Are you quite sure that's the only difference you can think of? Let me give you a hint: is there any difference in the volume for the same cost at all?

It's the problem. I often have to guide LLMs 2-4 times to properly write 150-300 LOC changes because I see how the code can be simplified or improved.

There is no way that 9000 lines of code are decent. It's also very hard to review them and find bad spots. Why spent your time in the first place? It probably took one hour for a person to generate it, but it will take ten to review and point out hundreds (probably) problems.

Without AI, no one would submit 9000 lines, because that's tens of hours of work which you usually split into logical parts.


It is 1995. You get an unsolicited email with a dubious business offer. Upon reflection, you decide it's not worth consideration and delete it. No need to wonder how it was sent to you; that doesn't need to influence the way you handle it.

No. We need spam filters for this stuff. If it isn't obvious to you yet, it will be soon. (Or else you're one of the spammers.)


The original ask was about one PR.

Didn’t even hit the barn, sorry. Codegen tools were obvious, review assistance tools are very lagging, but will come.

We already have some of them. And if you have a wide enough definition, we had them for a while.

It 100% is.

Why would I bother reviewing code you didn't write and most likely didn't read ?


It is a huge problem. PR reviews are a big deal, not just for code reasons, but they are one of the best teaching tools for new hires. Good ones take time and mental energy.

Asking me to review a shitty PR that you don't understand is just disrespectful. Not only is it a huge waste of everyones time, you're forcing me to do your work for you (understanding and validating the AI solution) and you aren't learning anything because it isn't your work.


Eh, ask the author to split it in reviewable blocks if you think there's a chance you actually want a version of the code. More likely if it's introducing tons of complexity to a conceptually simple service you just outright reject it on that basis.

Possibly you reject it with "this seems more suitable for a fork than a contribution to the existing project". After all there's probably at least some reason they want all that complexity and you don't.


If you try to inspect and question such code, you will usually quickly run into that realisation that the "author" has basically no idea what the code even does.

"review it like it wasn't AI generated" only applies if you can't tell, which wouldn't be relevant to the original question that assumes it was instantly recognisable as AI slop.

If you use AI and I can't tell you did, then you're using it effectively.


If it's objectively bad code, it should be easy enough to point out specifics.

After pointing out 2-3 things, you can just say that the quality seems too low and to come back once it meets standards. Which can include PR size for good measure.

If the author can't explain what the code does, make an explicit standard that PR authors must be able to explain their code.


You are optimistic like the author even cared about the code. Most of the time you get another LLM response on why the code “works”

I’m curious how people would suggest dealing with large self-contained features that can’t be merged to main until they are production-ready, and therefore might become quite large prior to a PR.

While it would be nice to ship this kind of thing in smaller iterative units, that doesn’t always make sense from a product perspective. Sometimes version 0 has bunch of requirements that are non-negotiable and simply need a lot of code to implement. Do you just ask for periodic reviews of the branch along the way?


The way we do it where I work (large company in the cloud/cybersecurity/cdn space):

- Chains of manageable, self-contained PRs each implementing a limited scope of functionality. “Manageable” in this context means at most a handful of commits, and probably no more than a few hundred lines of code (probably less than a hundred tbh).

- The main branch holds the latest version of the code, but that doesn’t mean it’s deployed to production as-is. Releases are regularly cut from stable points of this branch.

- The full “product” or feature is disabled by a false-by-default flag until it’s ready for production.

- Enablement in production is performed in small batches, rolling back to disabled if anything breaks.


In our case, if such a thing happens (a few times per year across hundreds of people), a separate branch is created and a team working on that feature is completely autonomous for a while, while there is constant normal work in trunk by everyone else. Team tests their feature and adjacent code to an acceptable beta state but doesn't do any extensive or full coverage because it is impossible. Their code may be reviewed at that point if they request it, but it done as an extra activity, with meetings and stuff. Then they optionally give this build to the general QA to run full suite on it. This may be done in several cycles if fatal issues are found. Then they announce that they will do merge into trunk on days A to B and ask everyone to please hold off on committing into trunk in that time. Around that time they send a mail outlining changes and new functionality and potential or actual unfixed issues. QA teams runs as full cover of tests as possible. Merge may be reverted at this point if it is truly bad. Or if it good, team announces success and proceeds with normal work mode.

> I’m curious how people would suggest dealing with large self-contained features that can’t be merged to main until they are production-ready

Are you hiding them from CIA or Al-Qaeda?

Feature toggles, or just plain Boolean flag are not rocket science.


Not rocket science, but I think there are also some tradeoffs with feature flags?

People could build on top of half-baked stuff because it’s in main. Or you might interact with main in ways that aren’t ready for production and aren’t trivial to toggle… or you just forget a flag check somewhere important.

I could also see schema/type decisions getting locked in too early while the feature is still in flux, and then people don’t want to change after it’s already reviewed since it seems like thrashing.

But yeah, definitely it’s one option. How do you consider those tradeoffs?


They come from people who have established that their work is worth the time to review and that they'll have put it together competently.

If it's a newcomer to the project, a large self contained review is more likely to contain malware than benefits. View with suspicion.


The partial implementation could be turned off with a feature flag until it's complete.

you line up 10-20 PRs and merge them in a temporary integration branch that gets tested/demoed. The PRs still have to be reviewed/accepted and merged into main separately. You can say 'the purpose of this pr is to do x for blah, see top level ticket'. often there will be more than one ticket based on how self-contained the PRs are.

I will schedule review time with coworkers I trust to go over it with them.

It is about ownership to me. I own my PRs. If I throw garbage out and expect you to fix it I am making you own my PRs. No one wants to be forced to own other peoples work.


If you ask them to break it into blocks, are they not going to submit 10 more AI-generated PRs (each having its own paragraphs of description and comment spam), which you then have to wade through. Why sink even more time into it?

Being AI-generated is not the problem. Being AI-generated and not understandable is the problem. If they find a way to make the AI-generated code understandable, mission accomplished.

How much of their time should open source maintainers sink into this didactic exercise? Maybe someone should vibe-code a bot to manage the process automatically.

I think breaking a big PR up like this is usually fair

Sometimes I get really into a problem and just build. It results in very large PRs.

Marking the PR as a draft epic then breaking it down into a sequence smaller PRs makes it much easier to review. But you can solicit big picture critique there.

I’m also a huge fan of documentation, so each PR needs to be clear, describe the bigger picture, and link back to your epic.


There's probably also a decent chance that the author can't actually do it.

Let's say it's the 9000 lines of code. I'm also not reviewing 900 lines, so it would need to be more than 10 PRs. The code needs to be broken down into useful components, that requires the author to think about design. In this case you'd probably have the DSL parser as a few PRs. If you do it like that it's easier for the reviewer to ask "Why are you doing a DSL?" I feel like in this case the author would struggle to justify the choice and be forced to reconsider their design.

It's not just chopping the existing 9000 lines into X number of bits. It's submitting PRs that makes sense as standalone patches. Submitting 9000 lines in one go tells me that you're a very junior developer and that you need guidance in terms of design and processes.

For open source I think it's fine to simply close the PR without any review and say: Break this down, if you want me to look at it. Then if a smaller PR comes in, it's easier to assess if you even want the code. But if you're the kind of person that don't think twice about submitting 9000 lines of code, I don't think you're capable of breaking down you patch into sensible sub-components.


Some of the current AI coding tools can follow instruction like “break this PR up into smaller chunks”, so even a completely clueless user may be able to follow those instructions. But that doesn’t mean it’s worth a maintainer’s time to read the output of that.

> Or if you don't have an obligation to review it, you leave it there.

Don’t just leave it there, that reflects badly on you and your project and pushes away good contributors. If the PR is inadequate, close it.


My record is 45 comments on a single review. Merge conditions were configured so that every comment must be resolved.

If PR author can satisfy it - I'm fine with it.


They will let AI somewhat satisfying it and ask you for further review

Reminds me of curl problems with vulnerability report: https://news.ycombinator.com/item?id=43907376

At that point it is just malicious.


Some people genuinely believe agentic coding works great and they mastered it. Someone who PR a simple feature with its own DSL probably is on that team and won't see the issue with their way. They may think you are too old and resist AI. They probably would tell you if that's too much for your old fashioned coding skills, then just use an agent for the PR.

If you think that way, who cares about the code and additional DSL? If there is an issue or evolution required, we'll let AI work on it. If it works, just let it merge. Much cheaper than human reviewing everything.

I hate it, maybe I'm too old.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: