More

throwaw12 · 2026-04-09T19:09:31 1775761771

I am interested as well what future would look like. So far what I am seeing is:

(1) specialized AI agent -> (2) we should add 1790 agents to be competitive -> (3) pivot to agentic workforce platform

now we have lots and lots of agentic workforce platforms and sandbox providers to run them. All have similar capabilities: create agent for HR, create agent for Sales,...

Hope to see something interesting to pop-up, at least it was happening in SaaS-era where people were inventing new ways of solving old problems: DocuSign, Salesforce, Zoho,...

Aperocky · 2026-04-09T20:18:41 1775765921

I think both product and engineering is lacking. The only thing that works great today is the LLM model themselves.

Everything is dependent on "agents", but there are either barely any scaffold around them or it is full speghetti, at least it's hard to find one that's well constructed.

For instance, humans zoom around in cars, these cars don't spontaneously combust (most of the time), have seatbelts and airbags, and don't need engine oil replacement every 1 mile. Humans are amazing, the cars are also relatively solidly engineered (at least the ones we drive around today).

The agent product that we have today are decidedly NOT that. Maybe for a single week openclaw was it - and then it decided to add a trawler and a fishhook to the car along with 1000 other addition because why not? And that has been true for almost every one of the LLM/AI product I have seen.

crooked-v · 2026-04-09T19:24:15 1775762655

I think the winners here, such as it is, will be the companies that have an actual specialized service that actually does something, where any "agentic" functionality is on top of that.

throwaw12 · 2026-04-08T16:45:40 1775666740

> I don’t think Zuck’s vision is particularly compelling.

But he has to do it anyways, otherwise Meta can be disrupted easily.

Google, Apple has hardware, distribution channels for their products

Amazon has the marketplace and cloud

Microsoft has enterprise and cloud

Meta is always looking for ways to stay afloat

xnx · 2026-04-08T16:56:15 1775667375

Meta has 3.5 billion daily active users

throwaw12 · 2026-04-08T17:05:54 1775667954

and has competitors like: TikTok, SnapChat, YouTube, Netflix, X, HBO, Amazon Prime, all fighting for the attention time.

They are worried something like Sora can disrupt them quickly

throwaw12 · 2026-04-08T16:43:56 1775666636

How is that Meta spent so much money for talent and hardware, but the model barely matches Opus 4.6?

Especially, looking at these numbers after Claude Mythos, feels like either Anthropic has some secret sauce, or everyone else is dumber compared to the talent Anthropic has

strulovich · 2026-04-08T16:50:06 1775667006

Meta did a bunch of mistakes, and look like Zuckerberg spent a lot of money on talent and made big swings to change it (that happened about a year ago)

I think it’s unrealistic to expect them to come back from that pit to the top in one year, but I wouldn’t rule them out getting there with more time. That’s a possible future. They have the money and Zuckerberg’s drive at the helm. It can go a long way.

solenoid0937 · 2026-04-08T16:50:15 1775667015

It's benchmaxxed.

If they actually matched Opus 4.6 on such a short timeline, it would have been mighty impressive. (Keep in mind this is a new lab and they are prohibited from doing distills.)

throwaw12 · 2026-04-08T16:51:12 1775667072

how do you know it's benchmaxxed?

solenoid0937 · 2026-04-08T17:04:18 1775667858

Friends at Meta with access to the model + personal experience at Meta.

Meta's performance process is essentially "show good numbers or you're out." So guess what people do when they don't have good numbers? They fudge them. Happens all across the company.

luma · 2026-04-08T17:29:51 1775669391

For one, they aren't using the latest version of many of the benchmarks. eg, ARC-AGI 2 and not 3, etc.

prodigycorp · 2026-04-08T17:09:23 1775668163

meta's benchmaxing tendencies are well known. llama4 was mega benchmaxxed, there's nothing that suggests to me that meta's culture has changed.

spindump8930 · 2026-04-08T18:36:53 1775673413

Re: changes, there's been enormous turnover in AI organizations, and in theory this one was developed by a "new" org. Whether that means less or more benchmaxxing is anyone's guess.

bob001 · 2026-04-08T23:02:42 1775689362

More I'd guess since the new org needs to prove itself long enough for stock to vest. Fudge the benchmarks gives them a longer horizon before they're all fired anyways.

CuriouslyC · 2026-04-08T19:55:23 1775678123

Anthropic has just been focused on coding/terminal work longer mostly, and their PRO tier model is coding focused, unlike the GPT and Gemini pro tier models which have been optimized for science.

Their whole "training the LLM to be a person" technique probably contributes to its pleasant conversational behavior, and making its refusals less annoying (GPT 5.2+ got obnoxiously aligned), and also a bit to its greater autonomy.

Overall they don't have any real moat, but they are more focused than their competition (and their marketing team is slaying).

zozbot234 · 2026-04-08T20:19:41 1775679581

Autonomy for agentic workflows has nothing to do with "replying more like a person", you have to refine the model for it quite specifically. All the large players are trying to do that, it's not really specific to Anthropic. It may be true however that their higher focus on a "Constitutional AI"/RLAIF approach makes it a bit easier to align the model to desirable outcomes when acting agentically.

CuriouslyC · 2026-04-08T22:46:54 1775688414

You think it has nothing to do with it. Even they only have a loose understanding of exactly the final results of trying to treat Claude like a real being in terms of how the model acts.

For example, Claude has a "turn evil in response to reinforced reward hacking" behavior which is a fairly uniquely Claude thing (as far as I've seen anyhow), and very likely the result of that attempt to imbue personhood.

coffeebeqn · 2026-04-08T17:02:39 1775667759

Matching Opus 4.6 would be pretty good? It’s the SOTA actually available model

reissbaker · 2026-04-08T17:25:22 1775669122

Muse Spark doesn't even match GLM-5.1 on most benchmarks. And GLM is open source!

impulser_ · 2026-04-08T16:51:00 1775667060

It's not even on par with Sonnet. It's on par with open source models and it not even open source and sit behind a private preview API.

Might as well not release anything.

username223 · 2026-04-08T16:59:00 1775667540

Facebook is working with the talent that can’t find a job at some other company. It doesn’t surprise me they ship mediocrity.

zozbot234 · 2026-04-08T16:48:51 1775666931

> has some secret sauce

Yup, it's called test-time compute. Mythos is described as plenty slower than Opus, enough to seriously annoy users trying to use it for quick-feedback-loop agentic work. It is most properly compared with GPT Pro, Gemini DeepThink or this latest model's "Contemplating" mode. Otherwise you're just not comparing like for like.

throwaw12 · 2026-04-08T16:53:01 1775667181

> it's called test-time compute.

Why can't others easily replicate it?

coder68 · 2026-04-08T17:02:46 1775667766

I have not delved into the theory yet but it seems that the smaller open-source models do this already to an extent. They have less parameters, but spend much more time/tokens reasoning, as a way to close the performance gap. If you look at "tokens per problem" on https://swe-rebench.com/ it seems to be the case at least.

throwaw12 · 2026-04-08T11:21:16 1775647276

until your account gets banned.

you can figure out the fingerprinting today, but if they change it tomorrow and wait 5 months to force update everyone, they will catch you and ban

throwaw12 · 2026-04-08T09:42:16 1775641336

no massacre is justified, but can you remind us how and where did Hamas get helicopters and tanks and all of a sudden all cars were smashed? maybe Hannibal directive handed them over their tanks

throwaw12 · 2026-04-08T07:59:26 1775635166

> reason and evidence

It was upvoted by so many people actually because of reason and evidence.

Also, please stop using race card, no one is blaming a race, people are pointing out to the country who is carrying out these cruelties and majority of government supporting it and majority of army is executing the commands

k33n · 2026-04-08T09:33:41 1775640821

You guys are so correct that you have to flag everything that shows how irrational you are being.

throwaw12 · 2026-04-08T09:38:02 1775641082

I think you are attacking people for flagging it, without reading the actual content of the reply.

throwaw12 · 2026-04-07T18:43:16 1775587396

of course they're not giving access to everyone.

they better make billions directly from corporations, instead of giving them to average people who might get a chance out of poverty (but also bad actors using it to do even more bad things)

krackers · 2026-04-07T19:37:14 1775590634

Anthropic's definition of "safe AI" precludes open-source AI. This is clear if you listen to what he says in interviews, I think he might even prefer OpenAI's closed source models winning to having open-source AI (because at least in the former it's not a free-for-all)

throwaw12 · 2026-04-07T18:41:29 1775587289

are we cooked yet?

Benchmarks look very impressive! even if they're flawed, it still translates to real world improvements

ks2048 · 2026-04-08T00:10:16 1775607016

People say we're cooked every single day. The only response is to continue life as if we aren't. When we are, you won't have to ask that question.

vips7L · 2026-04-08T02:33:27 1775615607

Everyone’s pretending the suits are going to want to do the prompting. We all know they aren’t.

boring-human · 2026-04-08T03:58:57 1775620737

Suits in agriculture don't drive the combine either, a farmer does. The other 99% of pre-automation farmers went on to other jobs. They happened to be better jobs than farming, but that's not necessarily always the case.

ac29 · 2026-04-08T14:21:52 1775658112

> Suits in agriculture don't drive the combine either, a farmer does.

Advanced RTK based positioning systems have been in Ag for a long time now, so increasingly the farmer doesnt drive either

swader999 · 2026-04-08T14:28:10 1775658490

The suits won't prompt, the model will.

vips7L · 2026-04-08T20:52:44 1775681564

Sounds like the mythical agi I keep hearing about.

vaelin · 2026-04-08T16:04:06 1775664246

It's models all the way down.

boring-human · 2026-04-07T20:52:47 1775595167

Yep, I think the lede might be buried here and we're probably cooked (assuming you mean SWEs, but the writing has been on the wall for 4 months.)

I guess I'm still excited. What's my new profession going to be? Longer term, are we going to solve diseases and aging? Or are the ranks going to thin from 10B to 10000 trillionaires and world-scale con-artist misanthropes plus their concubines?

1attice · 2026-04-07T22:06:43 1775599603

Your new profession will be attempting to find enough gig work to eat. You will also be competing with self-driving taxis, so there's that as well.

RALaBarge · 2026-04-07T23:34:24 1775604864

I need to start SaaS for getting people to start doing lunges and squats so they can carry others around on their back, I need a founding engineer, a founding marketer, and 100m hard currency.

komali2 · 2026-04-08T05:30:30 1775626230

If wealth becomes too captured at the top, the working class become unable to be profitably exploited - squeezing blood from a stone.

When that happens, the ultra wealthy dynasties begin turning on each other. Happens frequently throughout history - WWI the last example.

Your options become choosing a trillionaire to swear fealty to and fight in their wars hoping your side wins, or I guess trying to walk away and scrape out a living somewhere not worth paying attention to.

Or, I suppose, revolution, but the last one with persistent success was led by Mao and required throwing literally millions of peasants against walls of rifles. Not sure it'd work against drones.

whalesalad · 2026-04-07T18:46:42 1775587602

There is an entire section on crafting chemical/bio weapons so yeah I think we are cooked.

redfloatplane · 2026-04-07T18:48:59 1775587739

There's been a section on this in nearly every system card anthropic has published so this isn't a new thing - and, this model doesn't have particularly higher risk than past models either:

> 2.1.3.2 On chemical and biological risks

> We believe that Mythos Preview does not pass this threshold due to its noted limitations in open-ended scientific reasoning, strategic judgment, and hypothesis triage. As such, we consider the uplift of threat actors without the ability to develop such weapons to be limited (with uncertainty about the extent to which weapons development by threat actors with existing expertise may be accelerated), even if we were to release the model for general availability. The overall picture is similar to the one from our most recent Risk Report.

semi-extrinsic · 2026-04-08T06:08:52 1775628532

LLMs are useless for this type of thing for the same reason that the Anarchist Cookbook has always been. The skills required to convert text into complicated reactions completing as intended (without killing yourself) is an art that's never actually written down anywhere, merely passed orally from generation to generation. Impossible for LLMs to learn stuff that's not written down.

This is the same reason why LLMs are not doing well at science in general - the tricky part of doing scientific research (indeed almost all of the process) never gets written down, so LLMs cannot learn it.

Imagine if we never preserved source code, just preserved the compiled output and started from scratch every time we wrote a new version of a program. No Github, just marketing fluff webpages describing what software actually did. Libraries only available as object code with terse API descriptions. Imagine how shit LLMs would be at SWE if that was the training corpus...

Davidzheng · 2026-04-08T06:18:52 1775629132

There's still RL

throwaw12 · 2026-04-07T11:09:28 1775560168

Everytime I hear that AI will replace someone I want to ask a question:

Okay, if AI can replace engineer with their Manager, what is stopping replacing manager with their managers and so on until CTO? or if we go with a different direction, AI replacing engineers with PMs, but then PM can be replaced by Solution Architects or Sales people, go even further customer can just create a meeting with AI agent, explain their needs and you don't need PM/SA/Engineers

Lets go even further instead of someone in the company explaining to AI, what if AI explains to another AI their company needs?

Hamuko · 2026-04-07T11:15:32 1775560532

The logical conclusion I've come to is that the world will be divided into two types of people in the future: entrepreneurs with an army of AI employees, and the unemployed. Presumably those entrepreneurs are mainly just selling to other entrepreneurs, since the unemployed don't really have any resources to buy their products and services.

rcarr · 2026-04-07T11:20:04 1775560804

Sometimes I wonder if what will actually happen, is that we will have very small startups, essentially a C-Suite but with actual coding skills and domain knowledge, who build new companies using AI that end up gaining an advantage over the incumbents and force them out of business. Indirectly you're replaced by AI. I could see the drasticity of the changes to culture and workflow that AI demands being too much for most legacy companies to handle. For instance, there are still a shockingly large number of companies relying on seriously dated software and paper based systems.

lan321 · 2026-04-07T11:22:11 1775560931

Time/working hours and pay.

It'd be nice to have a company with only CTO-level engineers, but no one can afford that or even find enough workers at a certain scale, regardless of pay.

It makes sense that with AI, you can have architects who haven't written code in 5 years produce acceptable code, but I don't know many people high up the chain who'd say they have the time or desire for that.

Until your level's backlog is empty, you'll always find something better to do than the tasks of your lower-level colleagues, and it'll never become empty.

throwaw12 · 2026-04-07T11:29:18 1775561358

> It'd be nice to have a company with only CTO-level engineers, but no one can afford that

The point I was trying to make is that, if AI is accurate for Engineering work, why do we think its not accurate for PM job, if it is accurate for PM job, then it should be accurate for others as well.

Subsequently, it makes agent swarms accurate.

So you will have 1 CEO talking and creating the product, then expose another channel to customers where customer agents talk to the company agent.

Problem is, AI is not accurate and problems accumulate, this is why you need engineers, same applies to PMs, if you solely rely on AI writing product docs, mistakes accumulate and your engineers will build totally different product

throwaw12 · 2026-04-02T20:45:29 1775162729

is this april fools joke?