More

sixtyj · 2026-01-22T20:58:40 1769115520

They whistleblowed themselves that Claude Cowork was coded by Claude Code… :)

throwup238 · 2026-01-22T22:04:18 1769119458

You can tell they’re all vibe coded.

Claude iOS app, Claude on the web (including Claude Code on the web) and Claude Code are some of the buggiest tools I have ever had to use on a daily basis. I’m including monstrosities like Altium and Solidworks and Vivado in the mix - software that actually does real shit constrained by the laws of physics rather than slinging basic JSON and strings around over HTTP.

It’s an utter embarrassment to the field of software engineering that they can’t even beat a single nine of reliability in their consumer facing products and if it wasn’t for the advantage Opus has over other models, they’d be dead in the water.

0x500x79 · 2026-01-23T03:56:51 1769140611

Even their status page (which are usually gamed) shows two 9s over the past 90 days.

loopdoend · 2026-01-23T01:15:05 1769130905

Single nine reliability would be 90% uptime lol. For 99.9% we call it triple 9 reliability.

throwup238 · 2026-01-23T01:23:03 1769131383

Single 9 would be 90%, which is roughly what I’m experiencing between CC for Web and the Claude iOS app. About 1 in 10 messages fail because of an unknown error and 1 in 10 CC for web sessions die irrecoverably. It’d probably be worse except for the fact that CC’s bugs in the terminal aren’t show stoppers like they are on web/mobile.

The only way Anthropic has two or three nines is in read only mode, but that’s be like measuring AWS using the console uptime while ignoring the actual control plane.

jrflowers · 2026-01-23T04:24:53 1769142293

Single nine could be just 9% :D

fizx · 2026-01-23T02:48:22 1769136502

hey, they have 9 8's

cactusplant7374 · 2026-01-22T22:43:48 1769121828

You're right.

https://github.com/anthropics/claude-code/issues

Codex has less but they also had quite a few outages in December. And I don't think Codex is as popular as Claude Code but that could change.

qcnguy · 2026-01-23T09:32:09 1769160729

Don't bother filing issues there. Their issue tracker is a galaxy-sized joke. They automatically close issues after 30 days of inactivity even if they weren't fixed, just to keep the issue count low.

The Reasonable Man might think that an AI company OF ALL COMPANIES would be able to use AI to triage bug tickets and reproduce them, but no! They expect humans to keep wasting their own time reproducing, pinging tickets and correcting Claude when it makes mistakes.

Random example: https://github.com/anthropics/claude-code/issues/12358

First reply from Anthropic: "Found 3 possible duplicate issues: This issue will be automatically closed as a duplicate in 3 days."

User replies, two of the tickets are irrelevant, one didn't help.

Second reply: "This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes."

Every ticket I ever filed was auto-closed for inactivity. Complete waste of time. I won't bother filing bugs again.

Macha · 2026-01-23T12:15:10 1769170510

> Every ticket I ever filed was auto-closed for inactivity. Complete waste of time. I won't bother filing bugs again.

Upcoming Anthropic Press Release: By using Claude to direct users to existing bugs reports, we have reduced tickets requiring direct action by xx% and even reduced the rate of incoming tickets

notsure2 · 2026-01-22T21:18:43 1769116723

Whistleblowed dog food.

b00ty4breakfast · 2026-01-22T22:09:21 1769119761

normally you don't share your dog food when you find out it actually sucks.

sixtyj · 2026-01-22T06:29:29 1769063369

The last paragraph is so accurate. Thanks. Just a small note: Developers, add your photo (mug shot) to your code, you receive more money. People do business with people.

In some book about behavioral economy there was a test with people in company kitchenette.

Above the coffee machine, there was a sign asking people who drink coffee at work to contribute to a jar for the next cpurchase. One sign was just text, while the other was also made with eyes. The one with eyes raised more money.

sixtyj · 2026-01-22T06:19:29 1769062769

Because forking is new coding /s (What we see is natural entropy of systems. Wannabe codies fork a repo… and instead of contributing to original one they make their own copy. What will happen if you repeat this a few times? ;)

quantbagel · 2026-01-22T07:51:13 1769068273

Well I wanted to implement light transport papers without having to deal with cpp. I think tinygrad, and more specifically tinyJIT are super useful abstractions. This is def not available in ts

nl · 2026-01-23T00:47:10 1769129230

My question was more why a fork instead of doing the conventional "import tinygrad" into your own project.

I don't think there is anywhere you are modifying tinygrad itself is there?

Keyframe · 2026-01-22T07:33:49 1769067229

That is a legit way of working on contribution. You fork, you work on the fork - if it's not junk then you issue a pull request. What's the deal with belittling and holier-than-thou moralizing?

sixtyj · 2026-01-22T14:47:11 1769093231

I have nothing against forking ofc. I like it. But I really don’t like laziness when there is no contribution to original project - instead those codies make the project as their, in fact it is just a (poor) fork. The result is the mess. My first comment was about this behaviour.

Forking is nice when it’s nice.

sixtyj · 2026-01-21T11:07:26 1768993646

No, you have similar experience as a lot of people have.

LLMs just fail (hallucinate) in less known fields of expertise.

Funny: Today I have asked Claude to give me syntax how to run Claude Code. And its answer was totally wrong :) So you go to documentation… and its parts are obsolete as well.

LLM development is in style “move fast and break things”.

So in few years there will be so many repos with gibberish code because “everybody is coder now” even basketball players or taxi drivers (no offense, ofc, just an example).

It is like giving F1 car to me :)

sixtyj · 2026-01-20T21:12:51 1768943571

Yup.

I have found out recently that Grok-4.1-fast has similar pricing (in cents) but 10x larger context window (2M tokens instead of ~128-200k of gpt-4-1-nano). And ~4% hallucination, lowest in blind tests in LLM arena.

verdverm · 2026-01-20T22:02:24 1768946544

You use stuff from xAi and Elmo?

I'm unwilling to look past Musk's politics, immorality, and manipulation on a global scale

rudhdb773b · 2026-01-20T22:18:09 1768947489

Grok is the best general purpose LLM in my experience. Only Gemini is comparable. It would be silly to ignore it, and xAI is less evil than Google these days.

naught0 · 2026-01-22T04:11:18 1769055078

When's the last time Sundar Pichai did a Hitler salute or had his creation calling itself "Mecha Hitler"?

rudhdb773b · 2026-01-22T15:25:04 1769095504

In the big picture, those events are insignificant compared to the negative impacts on society from Google's trillion dollar advertising business and the associated destruction of privacy.

naught0 · 2026-01-23T03:26:23 1769138783

fair points, but we'll have to see now that grok is in the pentagon. sky's the limit

sixtyj · 2026-01-20T18:18:37 1768933117

Let me fire up Tesseract.

https://github.com/tesseract-ocr

Jimmc414 · 2026-01-20T19:17:51 1768936671

I fought with Tesseract for quite a while. Its good if high accuracy doesn't matter. Transcribing a book from clean, consistent non-skewed data its fine and an LLM might even be able to clean it up. But for legal or accounting data from hand scanned documents, the error rate made it untenable. Even clean, scanned documents of the same category have all sorts of density and skew anomalies that get misinterpreted. You'll pull your hair out trying to account for edge cases and never get the results you need even with numerous adjustments and model retraining on errors.

Flash 2.5 or 3 with thinking gave the best results.

sixtyj · 2026-01-20T21:02:23 1768942943

Thanks. I was surprised that Tesseract had recognized poorly scanned magazines and with some Python library I was able to transcribe two-columns layout with almost no errors.

Tesseract is a cheap solution as it doesn’t touch any LLM.

For invoices, Gemini flash is really good, for sure, and you receive “sorted” data as well. So definitely thumbs up. I use it for transcription of difficult magazine layout.

I think that for such legally problematic usage as companies don’t like to share financial data with Google, it is be better to use a local model.

Ollama or HuggingFace has a lot of them.

v3ss0n · 2026-01-22T15:16:36 1769094996

Surya is a lot better in that.

sixtyj · 2026-01-20T17:07:03 1768928823

This is really odd. They are incomparable in terms of the scope that these companies have.

I have used all major tools: OpenAI (chat, api), Google Gemini (ai studio, api, cli, antigravity) and Claude (chat, code and api). Mostly for coding issues to solve.

Claude Code gives usable results almost instantly for small scripts and it can go live. Gemini CLI tells me that it doesn't have this and that - and I have tried pushing Gemini to deliver production-quality code. No chance.

I use the same style of coding instructions for all tools.

But difference is in hours. I had a Claude session - result was in minutes, Gemini - hour and in many rounds.

On the other side, Gemini Canvas is really powerful as it makes usable app/tool inside Gemini so you don’t have to know how to run Python or PHP.

And OpenAI has very powerful chat.

So all of them seem to have different focus groups…

nadis · 2026-01-20T17:48:46 1768931326

I tend to agree, but was curious to hear others' thoughts. My background is more product / software, less market prediction and performance so I really wasn't sure what to make of this.

sixtyj · 2026-01-19T19:09:50 1768849790

Two different teams, I suppose. It is quite common in such a big company.

bflesch · 2026-01-19T19:57:08 1768852628

No need to simp for a megacorp. No matter how much these extremely well-paid US tech workers blame the "organizatioN" for their unethical behavior.

sixtyj · 2026-01-19T08:05:39 1768809939

It should be “made for profit, we need to pay mortgage/loan as most of people”… this would be more honest :)

Buy-me-coffee / you can donate / payments in bitcoins accepted / pay as you use / etc.

But I am curious what could work so people wouldn’t be discouraged immediately?

Subscription (monthly/quarterly/annual) is annoying as well…

Adobe has started this wave, I remember it vividly.

sixtyj · 2026-01-10T08:09:37 1768032577

Obsidian is on Mac too. And it has live preview of md.

https://obsidian.md/download