Hacker Newsnew | past | comments | ask | show | jobs | submit | shmatt's commentslogin

Labubus peaking and falling doesnt really say much about scarcity and trends. Labubu is made by a public company, who's stock skyrocketed, and essentially decided to go all in and mass produce to meet the popularity

thats one option. But other companies sometimes choose to keep the scarcity and secrecy for years, even decades, and if they play their cards right it keeps working

Labubus fall is more about its makers decision to increase sales numbers instead of keeping them flat and generating more and more and more hype

Hermes can sell a $15,000 Birkin to everyone, im sure they can figure out the supply chain aspects if they really wanted to. and within a month everyone that wanted one would have one and sales would drop. Hermes will have a spike in sales, followed by a drop

Instead they force you to play years long games with their sales staff to get an opportunity to spend $15,000. And decades later people still opt in to spending thousands of dollars on plates and scarves hoping one day they will be offered one

This is just as true about a $40 Supreme, or Aime Leon Dore T-shirt, than it is for a $15,000 handbag. If you keep the scarcity going just right, it lasts much longer


That might be true of handbags, I am doubtful it is true of dolls. A handbag is a necessary accessory and has been for decades. The popular brands grew their way there slowly over many years. A company that explodes into popularity suddenly for a product people never knew they needed is likely to only stay in the spotlight for a short while and is best served taking advantage as best they can.


I agree that cashing in quickly before the fad faded was probably the right move for Labubu. However, there’s no world where Birkins (or other designer handbags) are a “necessary accessory”.


A handbag is necessary for many people to carry their thing. Whether they choose a more or less expensive item to fulfill that function is a separate question.


A lot of designer handbags are truly awful at carrying things. In practice they are primarily used as fashion accessory rather than as a functional bag.


True, but this does not particularly apply to the Birkin, which was famously created for the actress Jane Birkin after she complained to the CEO of Hermes that she couldn’t get a bag big enough to hold both scripts and baby diapers. Sure, it’s not as good at carrying things as a backpack, but it’s not bad either.

It does delight me no end to see a whole thread on handbags on HN. I agree with one of the parent posters though, handbags are an unusual category with long-lived brand status (like cars and watches) and not really comparable to lububus.


> which was famously created for the actress Jane Birkin after she complained to the CEO of Hermes that she couldn’t get a bag big enough to hold both scripts and baby diapers. Sure, it’s not as good at carrying things as a backpack, but it’s not bad either.

I checked this out and was amused to see that wikipedia notes:

> Birkin used the bag initially but later changed her mind because she was carrying too many things in it: "What's the use of having a second one?" she said laughingly. "You only need one and that busts your arm; they're bloody heavy. I'm going to have an operation for tendonitis in the shoulder".

In my experience it's pretty common to carry stuff in backpacks. They put a lot of weight on your spine, which can take it. Jane Birkin's comment reminded me of the idea in Dave Barry's Only Travel Guide You'll Ever Need that frequent travelers are always on the lookout for luggage that can hold more than it can actually hold.


I always found the birkin interesting because of how working class it looks versus its price tag. I grew up fairly poor, and the birkin bags always remind me of the leather purses my aunts, grandmothers, and teachers would carry.

This seems to occur in high fashion a lot, an upscale rendition of something popular among the working class.


It happens in fashion going both ways for a variety of reasons, though with fast fashion it's all so intermingled.

Many rock bands with working class roots "bring up" styles (like the newsboy cap), but also lower classes try and "look" upwards which can give us the nouveau riche clichés. Celebrities trying to hid their identity in public started to wear large sunglasses and suddenly everybody would start to wear them.

It's the primary reason why brands have become so important - fabric quality can vary, but jeans are otherwise just jeans; slap Gucci or Prada on it and suddenly you're signalling conspicuous consumption.


> This is just as true about a $40 Supreme, or Aime Leon Dore T-shirt, than it is for a $15,000 handbag.

According to a more fashion and design orientated friend of mine, you can buy knockoffs of Birkin or any other high-end bag. And, guess what? Some of those knockoffs and their manufacturers have developed a certain cachet, and actually sell for quite high prices. So of course, those have spawned knockoffs too.

It's like the bit in Pattern Recognition, isn't it?


Knockoffs? please, we call them “reps” ;)

There are whole subreddits devoted to this, the most well-known being repladies, which went private after it got too famous due to an NYT article. People will spend $1000 or more for a really good Birkin knockoff with high quality leather and hardware. The bags are almost all made in workshops in China. Getting one is apparently (I haven’t done it myself) an interesting exercise in trust and reputation: how do you know the seller isn’t going to send you a cheap knockoff from China rather than a “real” $1000 knockoff? In practice there is a whole world of trusted Chinese middlemen with reviews etc. who have a strong stake in keeping their reputation high in the “reps” community (but you’d better make sure the reviews are real…).


> People will spend $1000 or more for a really good Birkin knockoff with high quality leather and hardware.

I'd bet you a coffee that there are knockoffs, or "reps" if you prefer, that are actually at least in some respects better quality than the original.


Oh, absolutely. Probably some of the best leatherworkers in China are making high-end handbag knockoffs: it’s where the money is.


Probably the same people who make the real ones.


> Hermes can sell a $15,000 Birkin to everyone

It's sad and petty I know, but if I were a billionaire edgelord like Elon Musk, rather than Twitter, I'd buy Hermes and sell their products in supermarkets. All the past limited editions too. Just to fuck with the kind of people who buy them.

Then again Hermes is worth 200 billion and upsetting an oligarch's sidechick might just get me killed so maybe not.


He probably couldn't buy it if he wanted. They built their stock structure to be resistant to takeover attempts and instead they are controlled by a family holding. I _guess_ if Musk slings his whole fortune at it he might get it, but unlikely. Hermes is a very interesting company, I recommend the Acquired episode on them, along with the one about LVMH.


All that would happen is the Birkin would lose its appeal and some other company would step in to fill the role, and people would empty their closets of orange boxes and fill them with some other colour box


> Hermes can sell a $15,000 Birkin to everyone

Wait hold on, what?

Like, I get that you were referring to the fact that they keep things scarce even for rich people, but you literally said “everyone”, so I just gotta check: Are you saying that everyday people would be willing and able to spend $15000 on a luxury handbag?


The sale of new Birkin bags is famously invite-only. In that context, to "sell" to "everyone" means making the bag available for sale to everyone. "Anyone" would have been a less ambiguous word choice, but it's a minor grammatical issue and the meaning is still clear.


I didn't read anything about the everyperson beung able to do this.


There was an implied ‘who is on the waiting list for a Birkin bag currently’ in ‘everyone’. They did not mean every single person on Earth, they meant Hermes could sell a Birkin bag to every interested buyer.


Also made in Japan


I'm as big a weeb as anyone but this is a textbook example of:

>thing >:(

>thing, Japan :O


In the context of this thread it's not so egregious. It was "It's expensive because it's designer" "Also because it's made in japan".

It's an economic fact that a factory worker in Japan will have a higher salary and more overhead than a similar worker in Vietnam or China.


This actually works well the other way around.

When sales are still growing YoY (like the post covid market), but prices are up 30% or 40%, you understand your customer is still willing to pay the higher price

Its similar to a McDonalds or Starbucks situation where you just keep increasing prices dramatically until you get a first quarter of lower than expected sales, then you start adapting downwards

Most corporations still haven't hit that limit, see streaming companies increasing prices every few months, they still haven't hit the point where profits decrease YoY. When they do the streaming prices start decreasing


>see streaming companies increasing prices every few months They can do that because they are practically monopolies.


They can do it because people are hopelessly addicted to screens.

You won't die if you stop watching Netflix. We aren't talking food or medicine here. In fact your life would probably improve. But addiction is a real animal.


I wish there were some term other than addiction here: addicts routinely steal from friends and family to feed their addiction; addicts who are parents sometimes threaten to stop allowing their children to visit with a grandparent unless the grandparent helps the addict pay for the addiction; drug addicts living in violent neighborhoods sometimes agree to murder somebody in exchange for drugs.

Screen addicts almost never stoop that low and the ones that do are addicted to a cam girl (e.g., Grant Amato), porn or gambling, not Netflix (or social media).


a company with 800 million weekly active users, and only losing $10B-$15B before implementing ads - which IMO is coming fast and soon to the LLM world - i would never calculate a 90% chance their shares end up at $0 before an exit option

This is the easiest money and best relationship JPM could imagine


> a company with 800 million weekly active users

Wow, that's slightly more than Yahoo has. Well, had.


Yahoo is a disingenuous parallel here. Yahoo lost because they didn't correctly embrace their market position in what's otherwise the very ripe industry of search engines. Search engines created the 4th most valuable company in the world (Google).

We don't know how ripe OpenAI's industry or market position is, yet. Yahoo knew what they had lost pretty early onto its spiral.


This is literally the reason behind the collapse of Silicon Valley Bank. Debt keeps your cap table untouched, its very tempting at certain stages


Can you imagine Twitter/Reddit clients if the APIs were still free in the age of LLMs

In a way this is probably the future state - 1000 different clients for 1000 different people, each fully customized to their taste


> /Reddit

Still miss Apollo


At least from what I noticed - Junie from Jetbrains was the first to use a very high quality to do list, and it quickly became my favorite

I haven't used it since it became paid, but back then Junie was slow and thoughtful, while Cursor was constantly re-writing files that worked fine, and Claude was somewhere in the middle


Cursor added a UI for todo list and encourages it's agent to use it (its great ux, but you can't really see a file of it)

kiro from amazon does both tasks (in tasks.md) and specs.

Too many tools soon, choose what works for you


IIRC status pages drive customer compensation for downtime. Updating it is basically signing the check for their biggest customers, in most similar companies you need a very senior executive to approve the update

On the other side of this, Firebase probably doesn't have money at stake making the update


It is not the status page that drives customer compensation. It is downtime.


The status page is essentially an admission of guilt. It can require approval from the legal department and a high level official from the company to approve updating it and the verbiage used on the status page.


> It can require approval from the legal department and a high level official from the company to approve updating it and the verbiage used on the status page.

Is that true in this case or are you speculating? My company runs a cloud platform. Our strategy is to have outages happen as rarely as possible and to proactively offer rebates based on customer-measured downtime. I don't know why people would trust vendors that do otherwise.


I don't have any special knowledge about the companies involved in this outage. I do know most (all?) status pages for large companies have to be manually updated and not just anybody can do that. These things impact contracts, so you want to be really sure it is accurate and an actual outage (not just a monitor going off, possibly giving a false positive).


You are likely right, but it's still gross dishonesty. I'm not ready to let Google and their engineers off the hook for that.


Inter alia, "is essentially", "it can", tell us this is just free-associating.

We should probably avoid punishing them based on free-associating made by a random not-anonymous not-Googler not-Xoogler account on HN. (disclaimer: xoogler)


then it’s fucking useless. Let’s crowd source our own


That’s what Downdetector is.

https://downdetector.com/


You're in the crowdsourced version right now.


"It can", this is just free-associating, don't let it get to ya. (disclaimer: xoogler)


We tried to do that. It didn't work. Too much spam, scams, and abuse.


working on it! (valet network)


Nah, its just some client side caching / JS stuff. Clicking the big refresh button fixed it for me, 15 minutes before OP noted it.

(n.b. as much as Google in aggregate is evil, they're smart evil. You can't avoid execs approving every outage because checks without some paper trail, and execs don't want to approve every outage, you'd have to rely on too many engineers and sales people, even as ex-employees, to keep it a secret. disclaimer: xoogler)

(EDIT: for posterity, we're discussing a "overall status" thing with a huge refresh button, right above a huge table chockful of orange triangles that indicate "One or more regions affected" - even when the "overall status" was green, the table was still full of orange and visible immediately underneath. My point being, you gotta suppose a wholeeee bunch of stuff to get to the point there was ever info suppressed, much less suppressed intentionally to avoid cutting checks)


We sort of are able to recognize Nobel-worthy breakthroughs

One of the many definitions I have for AGI is being able to create the proofs for the 2030, 2050, 2100, etc Nobel Prizes, today

A sillier one I like is that AGI would output a correct proof that P ≠ NP on day 1


Isn't AGI just "general" intelligence as in -like a regular human- turing test kinda deal?

aren't you thinking about ASI/ Superintelligence way capable of outdoing humans?


Yes, a general consensus is AGI should be able to perform any task an average human is able to perform. Definitely nothing of Nobel prize level.


A bit poorly named; not really very general. AHI would be a better name.


AAI would be enough for me, although there are people who deny intelligence of non-human animals.


Another general consensus is that humans possess general intelligence.


Yes, we do seem to have a very high opinion of ourselves.


> Yes, a general consensus is AGI should be able to perform any task an average human is able to perform.

The goalposts are regularly moved so that AI companies and their investors can claim/hype that AGI will be around in a few years. :-)


I learned the definition I provided back in mid 90s, and it hasn't really changed since then.


Im old enough to remember the mystery and hype before o*/o1/strawberry that was supposed to be essentially AGI. We had serious news outlets write about senior people at OpenAI quitting because o1 was SkyNet

Now we're up to o4, AGI is still not even in near site (depending on your definition, I know). And OpenAI is up to about 5000 employees. I'd think even before AGI a new model would be able to cover for at least 4500 of those employees being fired, is that not the case?


Remember that Docusign has 7,000 employees. I think OpenAI is pretty lean for what they're accomplishing.


I don't think these comparisons are useful. Every time you look at companies like LinkedIn or Docusign, yeah - they have a lot of staff, but a significant proportion of this are functions like sales, customer support, and regulatory compliance across a bazillion different markets; along with all the internal tooling and processes you need to support that.

OpenAI is at a much earlier stage in their adventures and probably doesn't have that much baggage. Given their age and revenue streams, their headcount is quite substantial.


If we're making comparisons, its more like someone selling a $10,000 course on how to be a millionaire

Not directly from OpenAI - but people in the industry is advertising how these advanced models can replace employees, yet they keep on going on hiring tears (including OpenAI). Lets see the first company to stand behind their models, and replace 50% of their existing headcount with agents. That to me would be a sign these things are going to replace peoples jobs. Until I see that, if OpenAI can't figure out how to replace humans with models, then no one will

I mean could you imagine if todays announcement was - the chatgpt.com webdev team has been laid off, and all new features and fixes will be complete by Codex CLI + o4-mini. That means they believe in the product theyre advertising. Until they do something like that, theyll keep on trusting those human engineers and try selling other people on the dream


I'm also a skeptic on AI replacing many human jobs anytime soon. It's mostly going to assist, accelerate or amplify humans in completing work better or faster. That's the typical historical technology cycle where better tech makes work more efficient. Eventually that does allow the same work to be done with less people, like a better IP telephony system enabling a 90 person call center to handle the same call volume that previously required 100 people. But designing, manufacturing, selling, installing and supporting the new IP phone system also creates at least 10 new jobs.

So far the only significant human replacement I'm seeing AI enable is in low-end, entry level work. For example, fulfilling "gig work" for Fiverr like spending an hour or two whipping up a relatively low-quality graphic logo or other basic design work for $20. This is largely done at home by entry-level graphic design students in second-world locales like the Philippines or rural India. A good graphical AI can (and is) taking some of this work from the humans doing it. Although it's not even a big impact yet, primarily because for non-technical customers, the Fiverr workflow can still be easier or more comfortable than figuring out which AI tool to use and how to get what they really want from it.

The point is that this Fiverr piece-meal gig work is the lowest paying, least desirable work in graphic design. No one doing it wants to still be doing it a year or two from now. It's the Mcdonald's counter of their industry. They all aspire to higher skill, higher paying design jobs. They're only doing Fiverr gig work because they don't yet have a degree, enough resume credits or decent portfolio examples. Much like steam-powered bulldozers and pile drivers displaced pick axe swinging humans digging railroad tunnels in the 1800s, the new technology is displacing some of the least-desirable, lowest-paying jobs first. I don't yet see any clear reason this well-established 200+ year trend will be fundamentally different this time. And history is littered with those who predicted "but this time it'll be different."

I've read the scenarios which predict that AI will eventually be able to fundamentally and repeatedly self-improve autonomously, at scale and without limit. I do think AI will continue to improve but, like many others, I find the "self-improve" step to be a huge and unevidenced leap of faith. So, I don't think it's likely, for reasons I won't enumerate here because domain experts far smarter than I am have already written extensively about them.


Not really. It could also mean their company's effective headcount is much greater than its nominal one.


Yes and Amazon has 1.52 million employees. How many developers could they possibly need?

Or maybe it’s just nonsensical to compare the number of employees across companies - especially when they don’t do nearly the same thing.

On a related note, wait until you find out how many more employees that Apple has than Google since Apple has hundreds of retail employees.


Apple has fewer employees than Google (164k < 183k).


Siri must be really good.


what kind of employees does Docusign employ? surely Digital Documents dont require physical onsite distribution centers and labor


Just look at their careers page


Its a lot of sales/account managers. and some engineers

wow the sales go hard in this product


[flagged]


The US is not a signatory to the International Criminal Court so you won't see Musk on trial there.


I hope I don't have to link this adjacent reply of mine too many more times: https://news.ycombinator.com/item?id=43709056 Specifically "The venue is a matter of convenience, nothing more," and if you prefer another, that would work about as well. Perhaps Merano; I hear it's a lovely little town.


The closest Elon ever came to anything Hague-worthy is allowing Starlink to be used in Ukrainian attacks on Russian civilian infrastructure. I don't think the Hague would be interested in anything like that. And if his life is worthless, then what would you say about your own? Nonetheless, I commend you on your complete lack of hinges. /s


Oh, I'm thinking more in the sense of the special one-off kinds of trials, the sort Gustave Gilbert so ably observed. The venue is a matter of convenience, nothing more. To the rest I would say the worth of my life is no more mine to judge than anyone else is competent to do the same for themselves, or indeed other than foolish to pursue the attempt.


True.

Deep learning models will continue to improve as we feed them more data and use more compute, but they will still fail at even very simple tasks as long as the input data are outside their training distribution. The numerous examples of ChatGPT (even the latest, most powerful versions) failing at basic questions or tasks illustrate this well. Learning from data is not enough; there is a need for the kind of system-two thinking we humans develop as we grow. It is difficult to see how deep learning and backpropagation alone will help us model that. https://medium.com/thoughts-on-machine-learning/why-sam-altm...


> Im old enough to remember the mystery and hype before o*/o1/strawberry

So at least two years old?


Honestly, sometimes I wonder if most people these days kinda aren't at least that age, you know? Or less inhibited about acting it than I believe I recall people being last decade. Even compared to just a few years back, people seem more often to struggle to carry a thought, and resort much more quickly to emotional belligerence.

Oh, not that I haven't been as knocked about in the interim, of course. I'm not really claiming I'm better, and these are frightening times; I hope I'm neither projecting nor judging too harshly. But even trying to discount for the possibility, there still seems something new left to explain.


> Even compared to just a few years back, people seem more often to struggle to carry a thought, and resort much more quickly to emotional belligerence.

We're living in extremely uncertain times, with multiple global crises taking place at the same time, each of which could develop into a turning point for humankind.

At the same time, predatory algorithms do whatever it takes to make people addicted to media, while mental health care remains inaccessible for many.

I feel like throwing a tantrum almost every single day.


I feel perhaps I've been unkind to many people in my thoughts, but I'm conflicted. I don't understand myself to be particularly fearless, but what times call more for courage than times like these? How do people afraid even to try to practice courage expect to find it, when there isn't time for practice any more?


You have only so many spoons available per crisis. Even picking your battle can become a problem.

I've been out in the streets, protesting and raising awareness of climate change. I no longer do. It's a pointless waste of time. Today, the climate change deniers are in charge.


I don't assume I'm going to be given the luxury of picking my battles, and - though I've been aware of "spoon theory" since I watched it getting invented at Shakesville back in the day - I've never held to it all that strongly, even as I acknowledge I've also never been quite the same since a nasty bout of wild-type covid in early 2020. Now as before, I do what needs doing as best I can, then count the cost. Some day that will surely prove too high, and my forward planning efforts will be put to the test. Till then I'm happy not to borrow trouble.

I've lived in this neighborhood a long time, and there are a couple of old folks' homes a block or so from here. Both have excellent views, on one frontage each, of an extremely historic cemetery, which I have always found a wonderfully piquant example of my adopted hometown's occasionally wire-brush sense of humor. But I bring it up to mention that the old folks don't seem to have much concern for spoons other than to eat with, and they are protesting the present situation regularly and at considerable volume, and every time I pass about my errands I make a point of raising a fist and hollering "hell yeah!" just like most of the people who drive past honk in support.

Will you tell them it's pointless?


I think people expected reasoning to be more than just trained chain of thought (which was known already at the time). On the other hand, it is impressive that CoT can achieve so much.


Yeah, I don't know exactly what at an AGI model will look like, but I think it would have more than 200k context window.


Do you have a 200k context window? I don't. Most humans can only keep 6 or 7 things in short term memory. Beyond those 6 or 7 you are pulling data from your latent space, or replacing of the short term slots with new content.


But context windows for LLMs include all the “long term memory” things you’re excluding from humans


Long term memory in an LLM is its weights.


Not really, because humans can form long term memories from conversations, but LLM users aren’t finetuning models after every chat so the model remembers.


He's right, but most people don't have the resources, nor indeed the weights themselves, to keep training the models. But the weights are very much long term memory.


users aren’t finetuning models after every chat

Users can do that if they want, but it’s more effective and more efficient to do that after every billion chats, and I’m sure OpenAI does it.


If you want the entire model to remember everything it talked about with every user, sure. But ideally, I would want the model to remember what I told it a few million tokens ago, but not what you told it (because to me, the model should look like my private copy that only talks to me).


ideally, I would want the model to remember what I told it a few million tokens ago

Yes, you can keep finetuning your model on every chat you have with it. You can definitely make it remember everything you have ever said. LLMs are excellent at remembering their training data.


I'm not quite AGI, but I work quite adequately with a much, much smaller memory. Maybe AGI just needs to know how to use other computers and work with storage a bit better.


I'd think it would be able to at least suggest which model to use rather than just having 6 for you to choose from.


I’m not an AI researcher but I’m not convinced these contemporary artificial neural networks will get us to AGI, even assuming an acceleration to current scaling pace. Maybe my definition of AGI is off but I’m thinking what that means is a machine that can think, learn and behave in the world in ways very close to human. I think we need a fundamentally different paradigm for that. Not something that is just trained and deployed like current models, but something that is constantly observing, constantly learning and constantly interacting with the real world like we do. AHI, not AGI. True AGI may not exist because there are always compromises of some kind.

But, we don’t need AGI/AHI to transform large parts of our civilization. And I’m not seeing this happen either.


You are absolutely right that AGI will probably barely resemble LLMs, but this is kind of beside the point. An LLM just has to get good enough to automate sufficiently complicated coding tasks, like those of coding new AI experiments. From there, researchers can spin off new experiments rapidly and make further improvements. An AGI will likely have vastly different architecture from an LLM, but we will only discover that through likely hundreds of thousands of experiments with incremental improvements.

This is the ai-2027.com argument. LLMs only really have to get good enough at coding (and then researching), and it's singularity time.


I feel like every time AI gets better we shift the goalposts of AGI to something else.


I don't think we shift the goalposts for AGI. I'm not getting the sense that people are redefining what AGI is when a new model is released. I'm getting the sense that some people are thinking like me when a new model is released: we got a better transformer, and a more useful model trained on more or better data, but we didn't get closer to AGI. And people are saying this not because they've pushed out what AGI really means, they're saying this because the models still have the same basic use cases, the same flaws and the same limitations. They're just better at what they already do. Also, the better these models get at what they already do, the more starkly they contrast with human capabilities, for better or worse.


> Now we're up to o4, AGI is still not even in near site (depending on your definition, I know)

It's not only definition. Some googler was sure their model was conscious.


Meanwhile even the highest ranked models can’t do simple logic tasks. GothamChess on YouTube did some tests where he played against a bunch of the best models and every single one of them failed spectacularly.

They’d happily lose a queen to take a pawn. They failed to understand how pieces are even allowed to move, hallucinated the existence of new pieces, repeatedly declared checkmate when it wasn’t, etc.

I tried it last night with Gemini 2.5 Pro and it made it 6 turns before it started making illegal moves, and 8 turns before it got so confused about the state of the board before it refused to play with me any longer.

I was in the chess club in 3rd grade. One of the top ranked LLMs in the world is vastly dumber than I was in 3rd grade. But we’re going to pour hundreds of billions into this in the hope that it can end my career? Good luck with that, guys.


I'm not sure why people are expecting a language model to be great at chess. Remember they are trained on text, which is not the best medium for representing things like a chess board. They are also "general models", with limited training on pretty much everything apart from human language.

An Alpha Star type model would wipe the floor at chess.


This misses the point. LLMs will do things like move a knight by a single square as if it were a pawn. Chess is an extremely well understood game, and the rules about how things move is almost certainly well-represented in the training data.

These models cannot even make legal chess moves. That’s incredibly basic logic, and it shows how LLMs are still completely incapable of reasoning or understanding. Many kinds of task are never going to be possible for LLMs unless that changes. Programming is one of those tasks.


>These models cannot even make legal chess moves. That’s incredibly basic logic, and it shows how LLMs are still completely incapable of reasoning or understanding.

Yeah they can. There's a link I shared to prove it which you've conveniently ignored.

LLMs learn by predicting, failing and getting a little better, rinse and repeat. Pre-training is not like reading a book. LLMs trained on chess games play chess just fine. They don't make the silly mistakes you're talking about and they very rarely make illegal moves.

There's gpt-3.5-turbo-instruct which i already shared and plays at around 1800 ELO. Then there's this grandmaster level chess transformer - https://arxiv.org/abs/2402.04494. They're also a couple of models that were trained in the Eleuther AI discord that reached about 1100-1300 Elo.

I don't know what the peak of LLM Chess playing looks like but this is clearly less of a 'LLMs can't do this' problem and more 'Open AI/Anthropic/Google etc don't care if their models can play Chess or not' problem.

So are they capable of reasoning now or would you like to shift the posts ?


I think the point here is that if you have to pretrain it for every specific task, it's not artificial general intelligence, by definition.


There isn't any general intelligence that isn't receiving pre-traning. People spend 14 to 18+ years in school to have any sort of career.

You don't have to pretrain it for every little thing but it should come as no surprise that a complex non-trivial game would require it.

Even if you explained all the rules of chess clearly to someone brand new to it, it will be a while and lots of practice before they internalize it.

And like I said, LLM pre-training is less like a machine reading text and more like Evolution. If you gave a corpus of chess rules, you're only training a model that knows how to converse about chess rules.

Do humans require less 'pre-training' ? Sure, but then again, that's on the back of millions of years of evolution. Modern NNs initialize random weights and have relatively very little inductive bias.


People are focussing on chess, which is complicated, but LLM fail at even simple games like tic-tac-toe where you'd think, if it was capable of "reasoning" it would be able to understand where it went wrong. That doesn't seem to be the case.

What it can do is write and execute code to generate the correct output, but isn't that cheating?


Which SOTA LLM fails at tic-tac-toe?


I don't know, but it's not a hard test, get the LLM to play a perfect game of tic-tac-toe against itself, look at the output and see if it goes wrong.


Saying programming is a task that is "never going to be possible" for an LLM is a big claim, given how many people have derived huge value from having LLMs write code for them over the past two years.

(Unless you're arguing against the idea that LLMs are making programmers obsolete, in which case I fully agree with you.)


I think "useful as an assistant for coding" and "being able to program" are two different things.

When I was trying to understand what is happening with hallucination GPT gave me this: > It's called hallucinating when LLMs get things wrong because the model generates content that sounds plausible but is factually incorrect or made-up—similar to how a person might "see" or "experience" things that aren't real during a hallucination.

From that we can see that they fundamentally don't know what is correct. While they can get better at predicting correct answers, no-one has explained how they are expected to cross the boundary from "sounding plausible" to "knowing they are factually correct". All the attempts so far seem to be about reducing the likelihood of hallucination, not fixing the problem that they fundamentally don't understand what they are saying.

Until/unless they are able to understand the output enough to verify the truth then there's a knowledge gap that seems dangerous given how much code we are allowing "AI" to write.


Code is one of the few applications of LLMs where they DO have a mechanism for verifying if what they produced is correct: they can write code, run that code, look at the output and iterate in a loop until it does what it's supposed to do.


But that requires code that is runnable and testable in isolation otherwise there are all sorts issues with that approach (aside from the obvious one of scalability)

It also assumes they "understand" enough to be able to extract the correct output to test against.


> I'm not sure why people are expecting a language model to be great at chess.

Because the conversation is about AGI, and how far away we are from AGI.


Does AGI mean good at chess?

What if it is a dumb AGI?


Chess is not exactly a simple logic task. It requires you to keep track of 32 things in a 2d space.

I remember being extremely surprised when I could ask GPT3 to rotate a 3d model of a car in it's head and ask it about what I would see when sitting inside, or which doors would refuse to open because they're in contact with the ground.

It really depends on how much you want to shift the goalposts on what constitutes "simple".


> Chess is not exactly a simple logic task.

Compare to what a software engineer is able to do, it is very much a simple logic task. Or the average person having a non-trivial job. Or a beehive organizing its existence, from its amino acids up to hive organization. All those things are magnitudes harder than chess.

> I remember being extremely surprised when I could ask GPT3 to rotate a 3d model of a car in it's head and ask it about what I would see when sitting inside, or which doors would refuse to open because they're in contact with the ground.

It's not reasoning its way there. Somebody asked something similar some time in the corpus and that corpus also contained the answers. That's why it can answer. After a quite small number of moves, the chess board it unique and you can't fake it. You need to think ahead. A task which computers are traditionally very good at. Even trained chess players are. That LLMs are not goes to show that they are very far from AGI.


Claude can't beat Pokemon Red. Not even close yet: https://arstechnica.com/ai/2025/03/why-anthropics-claude-sti...


LLMs can play chess fine.

The best model you can play with is decent for a human - https://github.com/adamkarvonen/chess_gpt_eval

SOTA models can't play it because these companies don't really care about it.


> We had serious news outlets write about senior people at OpenAI quitting because o1 was SkyNet

I wonder if any of the people that quit regret doing so.

Seems a lot like Chicken Little behavior - "Oh no, the sky is falling!"

How anyone with technical acumen thinks current AI models are conscious, let alone capable of writing new features and expanding their abilities is beyond me. Might as well be afraid of calculators revolting and taking over the world.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: