More

what · 2026-04-23T03:16:46 1776914206

It’s not about creativity. The incentive to produce drops to zero when an LLM is just going to slurp it up and regurgitate it without some form of compensation (notoriety, money, whatever).

what · 2026-04-23T01:20:05 1776907205

Which ever shitty model they’re using for search is so much better than the free offerings from the other companies. It’s not even close. It’s not going anywhere.

what · 2026-04-22T04:50:44 1776833444

And this will get you like $1M at 45? You can’t retire on that.

0xbadcafebee · 2026-04-22T06:10:06 1776838206

$1.8M-$2.2M. Assumes 6%-7.5% annual return. Does not include employer contribution. Provides $72k-$88k /yr income. Assuming you pull social security at 67, your continued gains exceed your draw, and your fund perpetuates until you die.

hrimfaxi · 2026-04-22T07:55:48 1776844548

If you retire at 45 won't that significantly impact social security?

0xbadcafebee · 2026-04-22T14:43:06 1776868986

It just means you draw ~$2500/month instead of ~$3800/month. That makes your $77k/yr income into $107/yr, but more importantly it helps your retirement account keep growing so it outlives you.

dwedge · 2026-04-22T06:15:29 1776838529

You can't live on $40,000 a year?

gib444 · 2026-04-22T06:50:45 1776840645

What about property taxes, the occasional $40k visit to the ER for a few stitches?

dwedge · 2026-04-22T07:18:34 1776842314

Does that happen often to you?

gib444 · 2026-04-22T16:55:17 1776876917

No - my hospital visits are £0 ;)

dwedge · 2026-04-22T18:58:17 1776884297

How close is your net worth and age to a million at 45?

gib444 · 2026-04-22T20:33:29 1776890009

Pretty bang on actually.

And how big is your dick too?

dwedge · 2026-04-22T21:13:24 1776892404

Bang average

a96 · 2026-04-22T12:25:22 1776860722

I definitely could. An american maybe couldn't.

what · 2026-04-22T03:27:40 1776828460

Where can I see the actual prompts and follow ups you fed each model?

vunderba · 2026-04-22T03:46:56 1776829616

So the prompts are tuned and adjusted on a per-model basis. If you look at the number of attempts, each receives a specific prompt variation depending on the model. This honestly isn't as much of an issue these days because SOTA models natural language parsing (particularly the multimodal ones) has eliminated a lot of the byzantine syntax requirements of the SD/SDXL days.

The template prompt seen in each comparison gets adjusted through a guided LLM which has fine-tuned system prompts to rewrite prompts. The goal is to foster greater diversity while preserving intent, so the image model has a better chance of getting the image right.

Getting to your suggestion for posting all the raw prompts, that's actually a great idea. Too bad I didn't think about it until you suggested it. And if you multiply it out - there's 15 distinct test cases against 22 models at this point, each with an average of about 8 attempts so we’re talking about thousands of prompts many of which are scattered across my hard drive. I might try to do this as a future follow-up.

what · 2026-04-22T04:19:38 1776831578

Shouldn’t every model get the same prompt? Seems a bit weird, especially when you can’t see the prompts that were used.

vunderba · 2026-04-22T04:57:14 1776833834

The goal isn’t the prompt itself. The test is whether a prompt can be expressed in such a way that we still arrive at the author's intent, and of course to do so in a way that isn't unnatural.

The prompts despite their variation are still expressed in natural language.

The idea is that if you can rephrase the prompt and still get the desired outcome, then the model demonstrates a kind of understanding; however more variation attempts also get correspondingly penalized: this is treated more as a failure of steering, not of raw capability.

An example might help - take the Alexander the Great on a Hippity-Hop test case.

The starter prompt is this: "A historical oil painting of Alexander the Great riding a hippity-hop toy into battle."

If a model fails this a couple of times (multiple seeds), we might use a synonym for a hippity-hop, it was also known as a space hopper.

Still failing? We might try to describe the basic physical appearance of a hippity-hop.

Thus, something like GPT-Image-2 scored much higher on the compliance component of the test, requiring only a single attempt, compared with Z-Image Turbo, which required 14 attempts.

what · 2026-04-22T03:03:15 1776826995

Why would you use an LLM for OCR?

fennecfoxy · 2026-04-22T16:56:12 1776876972

Because if it's multimodal, oops all transformers and they're pretty much best in class for ocr now, afaik?

jetbalsa · 2026-04-23T20:10:39 1776975039

Yep, Its pretty damn good compared to classic OCR and even more lightweight ones as well that I can run locally. the cards just vary too much over time.

jubilanti · 2026-04-22T13:53:07 1776865987

Because apparently that's what programming is and can only be these days...

what · 2026-04-20T21:34:44 1776720884

Isn’t it the last comment in the chain that is being referenced? About Idris Elba playing the mother and that he did such a good job no one noticed?

what · 2026-04-19T02:03:45 1776564225

Can’t you just partition the table by time (or whatever) and drop old partitions and not worry about vacuuming? Why do you need to keep around completed jobs forever?

perrygeo · 2026-04-19T22:33:38 1776638018

If you're looking for kafka-like semantics, you might want to keep messages around.

Your temporal partition idea is spot on. But instead of dropping old partitions, you can instead archive them.

victorbjorklund · 2026-04-19T08:44:40 1776588280

What about old failed jobs? You might wanna keep them around? And maybe you have retries that have a backoff.

pierrekin · 2026-04-19T04:25:05 1776572705

Yes you can, and at the risk of sounding a little snarky; if you do something like that and then release it as open source, people may even discuss it on HN!

what · 2026-04-19T01:07:20 1776560840

> Why are you handwaving things away though? I've got you on max effort. I even patched the system prompts to reduce this.

Do you think it knows what max effort or patched system prompts are? It feels really weird to talk to an LLM like it’s a person that understands.

matheusmoreira · 2026-04-19T02:05:54 1776564354

I've tested system prompt patching and it's definitely capable of identifying that my changes have been applied.

As someone who's been programming alone for over a decade, I absolutely do want to enjoy my coding buddy experience. I want to trust it. I feel pretty bad when I have to treat Claude like a dumb machine. It's especially bad when it starts making mistakes due to lack of reasoning. When I start explaining obvious stuff it's because I've lost the respect I had for it and have started treating it like a moron I have to babysit instead of a fellow programmer. It's definitely capable of understanding and reasoning, it's just not doing it because of adaptive thinking or bad system prompts or whatever else.

hattmall · 2026-04-19T02:21:32 1776565292

I thought that was really weird as well.

what · 2026-04-17T22:36:04 1776465364

No, they still have to act in the interest of shareholders even if they have no voting power.

wsun19 · 2026-04-18T00:10:41 1776471041

As a PBC, the intent of the company is not only profit, but it's hard to analyze the counterfactuals of if Anthropic were a pure for-profit or a non-profit

hiroboto · 2026-04-18T09:24:10 1776504250

thats the benefit of a pbc

sumedh · 2026-04-18T03:19:18 1776482358

What will happen if they don't because the founders control the voting powe

what · 2026-04-17T22:22:12 1776464532

Your employer doesn’t pay the subscription cost, they pay per token. So it’s already way more than 10x the cost.

stingraycharles · 2026-04-18T00:01:43 1776470503

Depends on the type of subscription. We have Codex Team and have a monthly subscription, no per-token costs.