I am interested as well what future would look like. So far what I am seeing is:
(1) specialized AI agent -> (2) we should add 1790 agents to be competitive -> (3) pivot to agentic workforce platform
now we have lots and lots of agentic workforce platforms and sandbox providers to run them. All have similar capabilities: create agent for HR, create agent for Sales,...
Hope to see something interesting to pop-up, at least it was happening in SaaS-era where people were inventing new ways of solving old problems: DocuSign, Salesforce, Zoho,...
I think both product and engineering is lacking. The only thing that works great today is the LLM model themselves.
Everything is dependent on "agents", but there are either barely any scaffold around them or it is full speghetti, at least it's hard to find one that's well constructed.
For instance, humans zoom around in cars, these cars don't spontaneously combust (most of the time), have seatbelts and airbags, and don't need engine oil replacement every 1 mile. Humans are amazing, the cars are also relatively solidly engineered (at least the ones we drive around today).
The agent product that we have today are decidedly NOT that. Maybe for a single week openclaw was it - and then it decided to add a trawler and a fishhook to the car along with 1000 other addition because why not? And that has been true for almost every one of the LLM/AI product I have seen.
I think the winners here, such as it is, will be the companies that have an actual specialized service that actually does something, where any "agentic" functionality is on top of that.
How is that Meta spent so much money for talent and hardware, but the model barely matches Opus 4.6?
Especially, looking at these numbers after Claude Mythos, feels like either Anthropic has some secret sauce, or everyone else is dumber compared to the talent Anthropic has
Meta did a bunch of mistakes, and look like Zuckerberg spent a lot of money on talent and made big swings to change it (that happened about a year ago)
I think it’s unrealistic to expect them to come back from that pit to the top in one year, but I wouldn’t rule them out getting there with more time. That’s a possible future. They have the money and Zuckerberg’s drive at the helm. It can go a long way.
If they actually matched Opus 4.6 on such a short timeline, it would have been mighty impressive. (Keep in mind this is a new lab and they are prohibited from doing distills.)
Friends at Meta with access to the model + personal experience at Meta.
Meta's performance process is essentially "show good numbers or you're out." So guess what people do when they don't have good numbers? They fudge them. Happens all across the company.
Re: changes, there's been enormous turnover in AI organizations, and in theory this one was developed by a "new" org. Whether that means less or more benchmaxxing is anyone's guess.
More I'd guess since the new org needs to prove itself long enough for stock to vest. Fudge the benchmarks gives them a longer horizon before they're all fired anyways.
Anthropic has just been focused on coding/terminal work longer mostly, and their PRO tier model is coding focused, unlike the GPT and Gemini pro tier models which have been optimized for science.
Their whole "training the LLM to be a person" technique probably contributes to its pleasant conversational behavior, and making its refusals less annoying (GPT 5.2+ got obnoxiously aligned), and also a bit to its greater autonomy.
Overall they don't have any real moat, but they are more focused than their competition (and their marketing team is slaying).
Autonomy for agentic workflows has nothing to do with "replying more like a person", you have to refine the model for it quite specifically. All the large players are trying to do that, it's not really specific to Anthropic. It may be true however that their higher focus on a "Constitutional AI"/RLAIF approach makes it a bit easier to align the model to desirable outcomes when acting agentically.
You think it has nothing to do with it. Even they only have a loose understanding of exactly the final results of trying to treat Claude like a real being in terms of how the model acts.
For example, Claude has a "turn evil in response to reinforced reward hacking" behavior which is a fairly uniquely Claude thing (as far as I've seen anyhow), and very likely the result of that attempt to imbue personhood.
Yup, it's called test-time compute. Mythos is described as plenty slower than Opus, enough to seriously annoy users trying to use it for quick-feedback-loop agentic work. It is most properly compared with GPT Pro, Gemini DeepThink or this latest model's "Contemplating" mode. Otherwise you're just not comparing like for like.
I have not delved into the theory yet but it seems that the smaller open-source models do this already to an extent. They have less parameters, but spend much more time/tokens reasoning, as a way to close the performance gap. If you look at "tokens per problem" on https://swe-rebench.com/ it seems to be the case at least.
no massacre is justified, but can you remind us how and where did Hamas get helicopters and tanks and all of a sudden all cars were smashed? maybe Hannibal directive handed them over their tanks
It was upvoted by so many people actually because of reason and evidence.
Also, please stop using race card, no one is blaming a race, people are pointing out to the country who is carrying out these cruelties and majority of government supporting it and majority of army is executing the commands
they better make billions directly from corporations, instead of giving them to average people who might get a chance out of poverty (but also bad actors using it to do even more bad things)
Anthropic's definition of "safe AI" precludes open-source AI. This is clear if you listen to what he says in interviews, I think he might even prefer OpenAI's closed source models winning to having open-source AI (because at least in the former it's not a free-for-all)
Suits in agriculture don't drive the combine either, a farmer does. The other 99% of pre-automation farmers went on to other jobs. They happened to be better jobs than farming, but that's not necessarily always the case.
Yep, I think the lede might be buried here and we're probably cooked (assuming you mean SWEs, but the writing has been on the wall for 4 months.)
I guess I'm still excited. What's my new profession going to be? Longer term, are we going to solve diseases and aging? Or are the ranks going to thin from 10B to 10000 trillionaires and world-scale con-artist misanthropes plus their concubines?
I need to start SaaS for getting people to start doing lunges and squats so they can carry others around on their back, I need a founding engineer, a founding marketer, and 100m hard currency.
If wealth becomes too captured at the top, the working class become unable to be profitably exploited - squeezing blood from a stone.
When that happens, the ultra wealthy dynasties begin turning on each other. Happens frequently throughout history - WWI the last example.
Your options become choosing a trillionaire to swear fealty to and fight in their wars hoping your side wins, or I guess trying to walk away and scrape out a living somewhere not worth paying attention to.
Or, I suppose, revolution, but the last one with persistent success was led by Mao and required throwing literally millions of peasants against walls of rifles. Not sure it'd work against drones.
There's been a section on this in nearly every system card anthropic has published so this isn't a new thing - and, this model doesn't have particularly higher risk than past models either:
> 2.1.3.2 On chemical and biological risks
> We believe that Mythos Preview does not pass this threshold due to its noted limitations in
open-ended scientific reasoning, strategic judgment, and hypothesis triage. As such, we
consider the uplift of threat actors without the ability to develop such weapons to be
limited (with uncertainty about the extent to which weapons development by threat actors
with existing expertise may be accelerated), even if we were to release the model for
general availability. The overall picture is similar to the one from our most recent Risk
Report.
LLMs are useless for this type of thing for the same reason that the Anarchist Cookbook has always been. The skills required to convert text into complicated reactions completing as intended (without killing yourself) is an art that's never actually written down anywhere, merely passed orally from generation to generation. Impossible for LLMs to learn stuff that's not written down.
This is the same reason why LLMs are not doing well at science in general - the tricky part of doing scientific research (indeed almost all of the process) never gets written down, so LLMs cannot learn it.
Imagine if we never preserved source code, just preserved the compiled output and started from scratch every time we wrote a new version of a program. No Github, just marketing fluff webpages describing what software actually did. Libraries only available as object code with terse API descriptions. Imagine how shit LLMs would be at SWE if that was the training corpus...
Everytime I hear that AI will replace someone I want to ask a question:
Okay, if AI can replace engineer with their Manager, what is stopping replacing manager with their managers and so on until CTO? or if we go with a different direction, AI replacing engineers with PMs, but then PM can be replaced by Solution Architects or Sales people, go even further customer can just create a meeting with AI agent, explain their needs and you don't need PM/SA/Engineers
Lets go even further instead of someone in the company explaining to AI, what if AI explains to another AI their company needs?
The logical conclusion I've come to is that the world will be divided into two types of people in the future: entrepreneurs with an army of AI employees, and the unemployed. Presumably those entrepreneurs are mainly just selling to other entrepreneurs, since the unemployed don't really have any resources to buy their products and services.
Sometimes I wonder if what will actually happen, is that we will have very small startups, essentially a C-Suite but with actual coding skills and domain knowledge, who build new companies using AI that end up gaining an advantage over the incumbents and force them out of business. Indirectly you're replaced by AI. I could see the drasticity of the changes to culture and workflow that AI demands being too much for most legacy companies to handle. For instance, there are still a shockingly large number of companies relying on seriously dated software and paper based systems.
It'd be nice to have a company with only CTO-level engineers, but no one can afford that or even find enough workers at a certain scale, regardless of pay.
It makes sense that with AI, you can have architects who haven't written code in 5 years produce acceptable code, but I don't know many people high up the chain who'd say they have the time or desire for that.
Until your level's backlog is empty, you'll always find something better to do than the tasks of your lower-level colleagues, and it'll never become empty.
> It'd be nice to have a company with only CTO-level engineers, but no one can afford that
The point I was trying to make is that, if AI is accurate for Engineering work, why do we think its not accurate for PM job, if it is accurate for PM job, then it should be accurate for others as well.
Subsequently, it makes agent swarms accurate.
So you will have 1 CEO talking and creating the product, then expose another channel to customers where customer agents talk to the company agent.
Problem is, AI is not accurate and problems accumulate, this is why you need engineers, same applies to PMs, if you solely rely on AI writing product docs, mistakes accumulate and your engineers will build totally different product
(1) specialized AI agent -> (2) we should add 1790 agents to be competitive -> (3) pivot to agentic workforce platform
now we have lots and lots of agentic workforce platforms and sandbox providers to run them. All have similar capabilities: create agent for HR, create agent for Sales,...
Hope to see something interesting to pop-up, at least it was happening in SaaS-era where people were inventing new ways of solving old problems: DocuSign, Salesforce, Zoho,...
reply