Claude iOS app, Claude on the web (including Claude Code on the web) and Claude Code are some of the buggiest tools I have ever had to use on a daily basis. I’m including monstrosities like Altium and Solidworks and Vivado in the mix - software that actually does real shit constrained by the laws of physics rather than slinging basic JSON and strings around over HTTP.
It’s an utter embarrassment to the field of software engineering that they can’t even beat a single nine of reliability in their consumer facing products and if it wasn’t for the advantage Opus has over other models, they’d be dead in the water.
Single 9 would be 90%, which is roughly what I’m experiencing between CC for Web and the Claude iOS app. About 1 in 10 messages fail because of an unknown error and 1 in 10 CC for web sessions die irrecoverably. It’d probably be worse except for the fact that CC’s bugs in the terminal aren’t show stoppers like they are on web/mobile.
The only way Anthropic has two or three nines is in read only mode, but that’s be like measuring AWS using the console uptime while ignoring the actual control plane.
Don't bother filing issues there. Their issue tracker is a galaxy-sized joke. They automatically close issues after 30 days of inactivity even if they weren't fixed, just to keep the issue count low.
The Reasonable Man might think that an AI company OF ALL COMPANIES would be able to use AI to triage bug tickets and reproduce them, but no! They expect humans to keep wasting their own time reproducing, pinging tickets and correcting Claude when it makes mistakes.
First reply from Anthropic: "Found 3 possible duplicate issues: This issue will be automatically closed as a duplicate in 3 days."
User replies, two of the tickets are irrelevant, one didn't help.
Second reply: "This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes."
Every ticket I ever filed was auto-closed for inactivity. Complete waste of time. I won't bother filing bugs again.
> Every ticket I ever filed was auto-closed for inactivity. Complete waste of time. I won't bother filing bugs again.
Upcoming Anthropic Press Release: By using Claude to direct users to existing bugs reports, we have reduced tickets requiring direct action by xx% and even reduced the rate of incoming tickets
The last paragraph is so accurate. Thanks. Just a small note: Developers, add your photo (mug shot) to your code, you receive more money. People do business with people.
In some book about behavioral economy there was a test with people in company kitchenette.
Above the coffee machine, there was a sign asking people who drink coffee at work to contribute to a jar for the next cpurchase. One sign was just text, while the other was also made with eyes. The one with eyes raised more money.
Because forking is new coding /s (What we see is natural entropy of systems. Wannabe codies fork a repo… and instead of contributing to original one they make their own copy. What will happen if you repeat this a few times? ;)
Well I wanted to implement light transport papers without having to deal with cpp. I think tinygrad, and more specifically tinyJIT are super useful abstractions. This is def not available in ts
That is a legit way of working on contribution. You fork, you work on the fork - if it's not junk then you issue a pull request. What's the deal with belittling and holier-than-thou moralizing?
I have nothing against forking ofc. I like it. But I really don’t like laziness when there is no contribution to original project - instead those codies make the project as their, in fact it is just a (poor) fork. The result is the mess. My first comment was about this behaviour.
No, you have similar experience as a lot of people have.
LLMs just fail (hallucinate) in less known fields of expertise.
Funny: Today I have asked Claude to give me syntax how to run Claude Code. And its answer was totally wrong :) So you go to documentation… and its parts are obsolete as well.
LLM development is in style “move fast and break things”.
So in few years there will be so many repos with gibberish code because “everybody is coder now” even basketball players or taxi drivers (no offense, ofc, just an example).
I have found out recently that Grok-4.1-fast has similar pricing (in cents) but 10x larger context window (2M tokens instead of ~128-200k of gpt-4-1-nano). And ~4% hallucination, lowest in blind tests in LLM arena.
Grok is the best general purpose LLM in my experience. Only Gemini is comparable. It would be silly to ignore it, and xAI is less evil than Google these days.
In the big picture, those events are insignificant compared to the negative impacts on society from Google's trillion dollar advertising business and the associated destruction of privacy.
I fought with Tesseract for quite a while. Its good if high accuracy doesn't matter. Transcribing a book from clean, consistent non-skewed data its fine and an LLM might even be able to clean it up. But for legal or accounting data from hand scanned documents, the error rate made it untenable. Even clean, scanned documents of the same category have all sorts of density and skew anomalies that get misinterpreted. You'll pull your hair out trying to account for edge cases and never get the results you need even with numerous adjustments and model retraining on errors.
Flash 2.5 or 3 with thinking gave the best results.
Thanks. I was surprised that Tesseract had recognized poorly scanned magazines and with some Python library I was able to transcribe two-columns layout with almost no errors.
Tesseract is a cheap solution as it doesn’t touch any LLM.
For invoices, Gemini flash is really good, for sure, and you receive “sorted” data as well. So definitely thumbs up. I use it for transcription of difficult magazine layout.
I think that for such legally problematic usage as companies don’t like to share financial data with Google, it is be better to use a local model.
This is really odd. They are incomparable in terms of the scope that these companies have.
I have used all major tools: OpenAI (chat, api), Google Gemini (ai studio, api, cli, antigravity) and Claude (chat, code and api). Mostly for coding issues to solve.
Claude Code gives usable results almost instantly for small scripts and it can go live. Gemini CLI tells me that it doesn't have this and that - and I have tried pushing Gemini to deliver production-quality code. No chance.
I use the same style of coding instructions for all tools.
But difference is in hours. I had a Claude session - result was in minutes, Gemini - hour and in many rounds.
On the other side, Gemini Canvas is really powerful as it makes usable app/tool inside Gemini so you don’t have to know how to run Python or PHP.
And OpenAI has very powerful chat.
So all of them seem to have different focus groups…
I tend to agree, but was curious to hear others' thoughts. My background is more product / software, less market prediction and performance so I really wasn't sure what to make of this.
reply