This needs more discussion: Claude using Claude on a computer for coding https:/...

dmartinez · on Oct 22, 2024

Every time I see this argument made, there seems to be a level of complexity and/or operational cost above which people throw up their hands and say "well of course we can't do that".

I feel like we will see that again here as well. It really is similar to the self-driving problem.

nopinsight · on Oct 22, 2024

Self-driving is a beyond-six-sigma problem. An error rate of over 1-2 crashes per million miles, i.e., the human rate, is unacceptable.

Most jobs are not like that.

A good argument can be made, however, that software engineering, especially in important domains, will be among the last to be fully automated because software errors often cascade.

There’s a countervailing effect though. It’s easy to generate and validate synthetic data for lower-level code. Junior coding jobs will likely become less available soon.

aerhardt · on Oct 22, 2024

> software errors often cascade

Whereas software defects in design and architecture subtly accumulate, until they leave the codebase in a state in which it becomes utterly unworkable. It is one of the chief reasons why good devs get paid what they do. Software discussions very often underrate software extensibility, or in other words, its structural and architectural scaleability. Even software correctness is trivial in comparison - you can't even keep writing correct code if you've made an unworkable tire-fire. This could be a massive mountain for AI to climb.

nopinsight · on Oct 23, 2024

Current LLMs lack the ability to perform abstraction at the right level a problem requires. When this gets solved, we’d be quite a bit closer to AGI, which has implications far beyond job displacement.

ARC-AGI Benchmark might serve as a canary in the coal mine.

https://github.com/fchollet/ARC-AGI

hawk_ · on Oct 22, 2024

I hear you. But I have wondered if there won't be a need to maintain certain like of software when you can just have it be rewritten for each iteration. Like some kind of schema evolution, yes but throwaway software at each iteration.

aerhardt · on Oct 22, 2024

Well in terms of processing speed the AI could iterate on different designs until it finds an extensible one, with some kind of reinforcement learning loop. Produce a certain design, get stuck, throw it away, try a new one. Just like humans learn to write good code really - except at an unfathomable speed of iteration. But it still all sounds ridiculously challenging. There is something there that isn't about predicting next tokens like LLMs do. It's about inferring very complex, highly abstract metastructures in the text.

bamboozled · on Oct 23, 2024

The challenge might be around the edges here, I guess you'd be able to instruct an agent to always code to a certain API spec, but no piece of software runs or does anything really useful in vacuum.

vl · on Oct 22, 2024

Fundamentally there is human with limited brain capacity that got trained to that. It’s just a question of time when there are equally capable, and then exceedingly capable models. There is nothing magical or special about human brain.

The only question is how fast it is going to happen. Ie what percentage of jobs is going to be replaced next year and so on.

aerhardt · on Oct 22, 2024

> There is nothing magical or special about human brain.

There is a lot about the human brain that even the world's top neuroscientists don't know. There's plenty of magic about it if we define magic as undiscovered knowledge.

There's also no consensus among top AI researchers that current techniques like LLMs will get us anywhere close to AGI.

Nothing I've seen on current models (not even o1-preview) suggests to me that AIs can reason about codebases of more than 5k LOC. A top 5% engineer can probably make sense of a codebase of a couple million LOC in time.

Which models specifically have you seen that are looking like they will be able to surmount any time soon the challenges of software design and architecture I'm laying out in my previous comment?

snowe2010 · on Oct 23, 2024

Defining AGI as “can reason about 5MLOC” is ridiculous. When do the goal posts stop moving? When a computer can solve time travel? Babies have behavior all the time that is no more differentiable from what an LLM does on a normal basis (including terrible logic and hallucinations).

The majority of people on the planet can barely reason about how any given politician will affect them, even when there’s a billion resources out there telling them exactly that. No reasonable human would ever define AGI as having anything to do with coding at all, since that’s not even “general intelligence”… it’s learned facts and logic.

weweweoo · on Oct 23, 2024

Babies can at least manipulate the physical world. Large language model can never be defined as AGI until it can control a general purpose robot, similar to how human brain controls our body's motor functions.

snowe2010 · on Oct 25, 2024

You’re commenting that on an article about how Claude literally can do what you’re talking about.

aerhardt · on Oct 23, 2024

As generally intelligent beings, we can adapt to reading and producing 5M LOC, or to live in arctic climates, or to build a building in colonial or classical style as dictated by cost, taste, and other factors. That is generality in intelligence.

I haven't moved any goal posts - it is your definition which is way too narrow.

snowe2010 · on Oct 25, 2024

You’re literally moving the goalposts right now. These models _are_ adapting to what you’re talking about. When Claude makes a model for haikus, how is that different than a poet who knows literally nothing about math but is fantastic at poetry?

I’m sure as soon as Claude can handle 5MLOC you’ll say it should be 10, and it needs to make sure it can serve you a Michelin star dinner as well.

That’s not AGI. Stop moving the goalposts.

aerhardt · on Oct 28, 2024

My point was it's not AGI, I don't even know what you're talking about or who you're replying to anymore.

throwup238 · on Oct 23, 2024

> When do the goal posts stop moving?

When someone comes up with a rigorous definition of intelligence.

SoftTalker · on Oct 22, 2024

Errors not only cascade, in certain cases they have global impact in very little time. E.g. CrowdStrike.

And what is the title element on CrowdStrike's website today? "CrowdStrike: We Stop Breaches with AI-native Cybersecurity"

Can't wait.

unshavedyak · on Oct 22, 2024

I feel pain for the people who will be employed to "prompt engineer" the behavior of these things. When they inevitably hallucinate some insane behavior a human will have to take blame for why it's not working.. and yea, that'll be fun to be on the receiving end of.

WalterSear · on Oct 22, 2024

Humans 'hallucinate' like LLMs. The term used however, is confabulation: we all do it, we all do it quite frequently, and the process is well studied(1).

> We are shockingly ignorant of the causes of our own behavior. The explanations that we provide are sometimes wholly fabricated, and certainly never complete. Yet, that is not how it feels. Instead it feels like we know exactly what we're doing and why. This is confabulation: Guessing at plausible explanations for our behavior, and then regarding those guesses as introspective certainties. Every year psychologists use dramatic examples to entertain their undergraduate audiences. Confabulation is funny, but there is a serious side, too. Understanding it can help us act better and think better in everyday life.

I suspect it's an inherent aspect of human and LLM intelligences, and cannot be avoided. And yet, humans do ok, which is why I don't think it's the moat between LLM agents and AGI that it's generally assumed to be. I strongly suspect it's going to be yesterday's problem in 6-12 months at most.

(1) https://www.edge.org/response-detail/11513

kortilla · on Oct 23, 2024

No, confabulation isn’t anything like how LLMs hallucinate. LLMs will just very confidently make up APIs on systems they otherwise clearly have been trained on.

This happens nearly every time I request “how tos” for libraries that aren’t very popular. It will make up some parameters that don’t exist despite the rest of the code being valid. It’s not a memory error like confabulation where it’s convinced the response is valid from memory either, because it can be easily convinced that it made a mistake.

I’ve never worked with an engineer in my 25 years in the industry that has done this. People don’t confabulate to get day to day answers. What we call hallucination is the exact same process LLMs use to get valid answers.

WalterSear · on Oct 24, 2024

You work with engineers who confabulate all the time: it's an intrinsic aspect of how the human brain functions that has been demonstrated at multiple levels of cognition.

unshavedyak · on Oct 23, 2024

> Humans 'hallucinate' like LLMs. The term used however, is confabulation: we all do it, we all do it quite frequently, and the process is well studied(1).

Yea i agree, i'm not making a snipe at LLMs or anything of the sort.

I'm saying i expect there to be a human-fallback in the system for quite some time. But solving the fallback problems with be one of black boxes. Which is the worst kind of project in my view, i hate working on code i don't understand. Where the results are not predictable.

IncreasePosts · on Oct 22, 2024

That won't even be a real job. How exactly will there be this complex intelligence that can solve all these real world problems, but can't handle some ambiguity in some inputs it is provided? Wouldn't the ultra smart AI just ask clarifying questions so that literally anyone can "prompt engineer"?

pona-a · on Oct 23, 2024

As long as there is liability, there must be a human to blame, no matter how irrational. Every system has a failure mode, and ML models, especially the larger ones, often have the most odd and unique ones.

For example, we can mostly agree CLIP does a fine job classifying images, except if you glue a sticky note saying "iPod" onto an apple, it would say classify it as such.

No matter the performance, these are categorically statistical machines reaching for the most immediately useful representations, yielding an incoherent world model. These systems will be proposed as replacement to humans, they will do their best to pretend to work, they will inevitably fail over a long enough time horizon, and a human accustomed to rubber-stamping its decisions, or perhaps fooled by the shape of a correct answer, or simply tired enough to let it slip by, will take the blame.

bamboozled · on Oct 23, 2024

This is because it will be absolutely catastrophic economically when the majority of high paying jobs can be automated and owned by a few billionaires. Then what will go along with this catastrophe will be all the service people who had jobs to support the people with high paid jobs, they're fucked too. People don't want to have to face that.

We'd be losing access to food, shelter, insurance, purpose. I can't blame people for at least telling themselves some coping story.

It's going to be absolutely ruinous for many people. So what else should they do, admit they're fucked? I know we like to always be cold rational engineers on this forum, but shit looks pretty bleak in the short term if this goal of automating everyone's work comes true and there are basically zero social safety nets to deal with it.

I live abroad and my visa is tied to my job, so not only would losing my job be ruinous financially, it will likely mean deportation too as there will be no other job for me to turn to for renewal.

weweweoo · on Oct 23, 2024

If most people are unemployed, modern capitalism as we know it will collapse. I'm not sure that's in the interests of the billionaires. Perhaps some kind of a social safety net will be implemented.

But I do agree, there is no reason to be enthusiastic about any progress in AI, when the goal is simply automating people's jobs away.

bamboozled · on Oct 23, 2024

Sorry yeah, I'm not 100% sure it spells doom, but it's going to be a wicked transition period.

runako · on Oct 22, 2024

> True end-user programming and product manager programming are coming

This means that either product managers will have to start (effectively) writing in-depth specs again, or they will have to learn to accept the LLM's ideas in a way that most have not accepted their human programmers' ideas.

Definitely will be interesting to see how that plays out.

nopinsight · on Oct 22, 2024

Since automated coding systems can revise code and show the results much quicker than most human engineers can, writing detailed specs could be less necessary.

runako · on Oct 22, 2024

The bottleneck is still the person who has to evaluate the results.

The larger point is that building software is about making tons of decisions about how it works. Someone has to make those decisions. Either PMs will be happy letting machines make the decisions where they do not let programmers decide now. Or the PMs will have to make all the decisions before (spec) or after (evaluation + feedback look like you suggest).

risyachka · on Oct 22, 2024

Idk, LLMs have basically stopped improving for over a year now. And in their current state no matter how many abstractions you add to them - or chain them - they are not even close capable to replace even simple jobs.

dimitri-vs · on Oct 22, 2024

Agreed. The jump from GPT3.5 to GPT4 was truly mind blowing, from GPT-4 to Opus/Sonnet3.5 was pretty good, but if o1-preview really is GPT-5 then I feel like we're seeing the hype starting to collide with reality.

blauditore · on Oct 22, 2024

> True end-user programming and product manager programming are coming, probably pretty soon.

I'm placing my bets rather on this new object-oriented programming thing. It will make programming jobs obsolete any day now...

zeroonetwothree · on Oct 22, 2024

> If something similar happens, most jobs that could be done remotely will be automatable in a few years.

I'd be willing to be a large amount of money this doesn't happen, assuming "most" means >50% and "a few" is <5.

nopinsight · on Oct 23, 2024

Your semantics above is quite compatible with mine, although I hedged my statement with “a few” which could also mean up to a little over 5, like 6. Also I said “automatable”, not necessarily automated due to legal, political, reputational, or other reasons.

I’m curious to understand your reasoning. What would be some key roadblocks? Hallucinations and reliability issues in most domains will likely be solvable with agentic systems in a few years.

mirsadm · on Oct 23, 2024

It makes me wonder if people that make these claims have an actual job. Because if they did then I doubt anyone could make that claim with a straight face.

unshavedyak · on Oct 22, 2024

> If something similar is the case, most jobs that can be done remotely will be automatable in a couple of years.

I'm really curious on the cost of that sort of thing. Seems astronomical atm, but as much as i get shocked at the today-cost, staffing is also a pretty insane cost.

girvo · on Oct 22, 2024

Playing with Sonnet 3.5 this morning with Cline, my API cost to add a decent amount of functionality to my GraphQL server cost $0.1325 and took about 5 minutes. $1.80 is a lot cheaper than my hourly rate… but I’m the one reviewing what it doe to ensure it makes sense

And it got some things subtly wrong though so do I/my team. Interesting times ahead I think, but I’m not too worried about my job as a principal dev. Again I’m more stressed about juniors

TacticalCoder · on Oct 22, 2024

> This needs more discussion:

"Create a simple website" has to be one of the most common blog / example out there in about every programming language.

It can automate stuff? That's cool: I already did automate screenshots and then AI looking if it looks like phishing or not (and it's quite good at it).

I mean: the "Claude using Claude" may seem cool, but I dispute the "for coding" part. That's trivial stuff. A trivial error (which it doesn't fix btw: it just deletes everything).

'Claude, write me code to bring SpaceX rockets back on earth"

or

"Claude, write me code to pilot a machine to treat a tumor with precision"

This was not it.

evilfred · on Oct 22, 2024

i am sure it will do great handling error cases and pixel perfect ui

fragmede · on Oct 22, 2024

openinterpreter has been doing this for a while, with a bunch of LLMs, glad to see first party support for this use case

kobe_bryant · on Oct 22, 2024

and how is Midjourney doing? did it change the world?