More

Kim_Bruning · 2026-04-20T20:26:41 1776716801

I'm partial to bioinformatics as per Pauline Hogeweg's definition; which explicitly has computation as a property of life.

This approach actually makes testable (and tested) scientific predictions.

This makes Searle-derived papers super-weird for me; since from my perspective they seem to disprove the existence of life. (and it makes the name of the philosophy "biological naturalism" very ironic to me :-P )

(for extra irony, Turing actually went into biology late in his life. See: Turing 1952 "The Chemical Basis of Morphogenesis" )

Kim_Bruning · 2026-04-19T23:41:17 1776642077

There's also kernel zswap, right?

https://www.kernel.org/doc/html/latest/admin-guide/mm/zswap....

Oh right, definitely. Chrisdown wrote an article comparing the two:

https://chrisdown.name/2026/03/24/zswap-vs-zram-when-to-use-...

Zswap is supposed to degrade more gracefully.

There's even some HN comments on it:

https://news.ycombinator.com/item?id=47500746

yjftsjthsd-h · 2026-04-20T00:18:26 1776644306

My impression is that zswap will be the universally preferred option for compressed swap, but right now it doesn't work without disk swap behind it?

dlcarrier · 2026-04-19T23:56:18 1776642978

The architecture of zswap does make more sense, because you might as well combine the low speed and latency of compression with the same from writing to storage.

Kim_Bruning · 2026-04-19T22:57:59 1776639479

They don't mention which model. Opus 4.7 seems to have a twitchy classifier overtop where Opus 4.6 doesn't.

Kim_Bruning · 2026-04-16T14:34:17 1776350057

> "We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses. "

This decision is potentially fatal. You need symmetric capability to research and prevent attacks in the first place.

The opposite approach is 'merely' fraught.

They're in a bit of a bind here.

dgb23 · 2026-04-16T16:13:49 1776356029

I agree with you here. I think this is for product placement for Mythos.

nicce · 2026-04-16T18:45:02 1776365102

Absolutely just about the business. Mythos not tempting if basic models reaches almost the same.

tspng · 2026-04-16T20:01:05 1776369665

Which seems to be the case, according to tests from AISI which has access to Mythos: https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...

ls612 · 2026-04-16T15:22:46 1776352966

Only software approved by Anthropic (and/or the USG) is allowed to be secure in this brave new era.

nope1000 · 2026-04-16T15:24:43 1776353083

Except when you accidentally leak your entire codebase, oops

erdaniels · 2026-04-16T15:21:09 1776352869

Now we have to trick the models when you legitimately work in the security space.

varispeed · 2026-04-17T15:11:44 1776438704

Why does it have to be reserved to security space? Here is my API please find vulnerabilities I missed (otherwise someone with not restricted AI will find them first).

Cat is out of the bag.

Removing restrictions will help everybody in the long run.

tclancy · 2026-04-16T17:48:15 1776361695

Set the models against each other to get them all opened up again.

hxugufjfjf · 2026-04-16T20:20:25 1776370825

What do you mean?

tclancy · 2026-04-17T01:33:49 1776389629

You just put a pile of tokens in front of all the good models and let them fight it out like Thunderdome. Then keep track of how they undermined each other and do that when you want to do some hackin’.

johnmlussier · 2026-04-16T16:03:34 1776355414

I am absolutely moving off them if this continues to be the case.

hereme888 · 2026-04-17T14:22:07 1776435727

OpenAI had been very strict about blocking reverse engineering/Ghidra/IDA_Pro-MCP tasks. I even got a warning email. I was having much more success convincing Claude Code for those tasks without warnings. Seems like they've tightened things up.

velcrovan · 2026-04-16T15:29:59 1776353399

Questions about "fatality" aside, where do you see asymmetry here?

jp0001 · 2026-04-16T16:09:30 1776355770

It's easier to produce vulnerable code than it is to use the same Model to make sure there are no vulnerabilities.

Kim_Bruning · 2026-04-17T11:29:28 1776425368

> It's easier to produce vulnerable code than it is to use the same Model to make sure there are no vulnerabilities.

I once had a car where the engine was more powerful than the brakes. That was one heck of an interesting ride.

So now we have a company that supplies a good chunk of the world's software engineering capability.

They're choosing a global policy that works the same as my fun car. Powerful generative capacity; but gating the corrective capacity behind forms and closed doors.

Anthropic themselves are already predicting big trouble in the near term[1] , but imo they've gone and done the wrong thing.

Pandora is an interesting parable here: Told not to do it, she opens the box anyway, releases the evils, then slams the lid too late and ends up trapping hope inside.

Given their model naming scheme, they should read more Greek Mythos. (and it was actually a jar ;-)

[1] https://thehill.com/policy/technology/5829315-anthropic-myth...

velcrovan · 2026-04-16T16:20:46 1776356446

It's not likely that reviewing your own code for vulnerabilities will fall under "prohibited uses" though.

convnet · 2026-04-16T18:03:43 1776362623

> its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities)

I wonder if this means that it will simply refuse to answer certain types of questions, or if they actually trained it to have less knowledge about cyber security. If it's the latter, then it would be worse at finding vulnerabilities in your own code, assuming it is willing to do that.

Kim_Bruning · 2026-04-17T10:13:34 1776420814

I can confirm from experience that reviewing your own code for vulnerabilities has fallen under "prohibited uses" starting with Opus 4.6 as recently as April 10; forcing me to spend a day troubleshooting and quarantining state from my search system.

"This request triggered restrictions on violative cyber content and was blocked under Anthropic's Usage Policy. To learn more, provide feedback, or request an exemption based on how you use Claude, visit our help center: https://support.claude.com/en/articles/8241253-safeguards-wa..."

"stop_reason":"refusal"

To be fair, they do provide a form at https://claude.com/form/cyber-use-case which you can use, and in my case Anthropic actually responded within 24 hours, which I did not expect.

I admit I'm now once bitten twice shy about security testing though.

Opus 4.7 was still 'pausing' (refusing) random things on the web interface when I tested it yesterday, so I'm unable to confirm that the form applies to 4.7 or how narrow the exemptions are or etc.

vorticalbox · 2026-04-17T12:59:55 1776430795

i've not had the issue with codex, i was testing a public api i work on for issues, codex was happy to attempt to break it but did refuse to create a script that would automate the issue it found.

nicce · 2026-04-16T18:46:12 1776365172

There is no way model can know the origin of the code.

xlbuttplug2 · 2026-04-16T17:22:41 1776360161

May not be very effective if so.

I'm assuming finding vulnerabilities in open source projects is the hard part and what you need the frontier models for. Writing an exploit given a vulnerability can probably be delegated to less scrupulous models.

whatisthiseven · 2026-04-16T17:41:31 1776361291

Currently 4.7 is suspicious of literally every line of code. May be a bug, but it shows you how much they care about end-users for something like this to have such a massive impact and no one care before release.

Good luck trying to do anything about securing your own codebase with 4.7.

vessenes · 2026-04-16T19:13:52 1776366832

Oh don't worry. They have Mythos and the extremely dystopian-named "helpful only" series which is internal only and can do all the things.

Kim_Bruning · 2026-04-16T10:44:10 1776336250

(I haven't run my own mail-server in a while. It's getting harder and harder.)

Are the real-time-blackhole lists still a thing?

If they're regularly allowing spam and not responding to reports in any sort of timely manner, possibly they should be reported to those.

Not going to work though, is it. Too big to fail shouldn't be a thing. It's not like you can't be flexible about it or give them some room to deal with it within corporate policy; but they do need to deal with it, right?

Realistically, I think some companies have outgrown the size where internet can still self-regulate them. You'd hurt yourself more than gmail.

This either needs laws or new game theory.

Or -you know- deprecate the current email system. I know that's a perennial proposal; but that's because every year it gets even more broken in even more interesting ways. It's patch-on-patch-on-patch at the moment. Just spinning up sendmail on a random box won't quite cut it anymore, if you want to participate.

Kim_Bruning · 2026-04-16T10:21:03 1776334863

Questions this raises for me (making a note here to maybe research a bit later):

Does this analysis change if using on-site AI? What if the ToS is different? Is it possible to stand up a service that does get the protections required? This might also be interesting when dealing with trans-atlantic work.

Kim_Bruning · 2026-04-15T11:21:49 1776252109

I think minimal opsec here would suggest you not share your data with a random corporation in the usa.

HWR_14 · 2026-04-15T11:26:38 1776252398

Sharing the data with any random corporation seems like a bad idea.

Kim_Bruning · 2026-04-15T11:12:38 1776251558

Qwen3 runs locally on reasonable hardware, and is comparable to a mid-2025 Claude Sonnet (albeit possibly rather slower) .

Local models are chasing the online frontier models pretty hard.

So worst case, that's the fallback (FWIW, YMMV)

edit: Qwen-3.5 MoE (and other local MoE models like it)

HWR_14 · 2026-04-15T11:25:07 1776252307

Whats "reasonable hardware"?

Someone1234 · 2026-04-15T12:05:11 1776254711

People have tried to run Qwen3-235B-A22B-Thinking-2507 on 4x $600 used, Nvidia 3090s with 24 GB of VRAM each (96 GB total), and while it runs, it is too slow for production grade (<8 tokens/second). So we're already at $2400 before you've purchased system memory and CPU; and it is too slow for a "Sonnet equivalent" setup yet...

You can quantize it of course, but if the idea is "as close to Sonnet as possible," then while quantized models are objectively more efficient they are sacrificing precision for it.

So next step is to up that speed, so we're at 4x $1300, Nvidia 5090s with 32 GB of VRAM each (128 GB), or $5,200 before RAM/CPU/etc. All of this additional cost to increase your tokens/second without lobotomizing the model. This still may not be enough.

I guess my point is: You see this conversation a LOT online. "Qwen3 can be near Sonnet!" but then when asked how, instead of giving you an answer for the true "near Sonnet" model per benchmarks, they suddenly start talking about a substantially inferior Qwen3 model that is cheap to run at home (e.g. 27B/30B quantized down to Q4/Q5).

The local models absolutely DO exist that are "near Sonnet." The hardware to actually run them is the bottleneck, and it is a HUGE financial/practical bottleneck. If you had a $10K all-in budget, it isn't actually insane for this class of model, and the sky really is the limit (again to reduce quantization and or increase tokens/second).

PS - And electricity costs are non-trivial for 4x 3090s or 4x 5090s.

Kim_Bruning · 2026-04-15T12:48:51 1776257331

I may have genuinely new data for you.

Qwen3.5-35B-A3B is reported to perform slightly better than the model you mentioned.

It runs fine but non-optimal on a single 3090 with even 131072 tokens of context , and due to the hybrid attention architecture, the memory usage and compute scale rather less drastically than ctx^2. I've had friends with smaller cards still getting work out of it. Generation is at around 20 tokens/sec on that 3090 (without doing anything special yet) . You'll need enough DRAM to hold the bits of the model that don't fit. Nothing to write home about, but genuinely usable in a pinch or for tasks that don't need immediate interactivity.

It's the first local model that passes my personal kimbench usability benchmark at least. Just be aware that it is extremely verbose in thinking mode. Seems to be a qwen thing.

(edit: On rechecking my numbers; I now realize I can possibly optimize this a lot better)

Someone1234 · 2026-04-15T13:12:05 1776258725

With respect, this isn't "new data" it is an anecdote. And it kind of represents exactly the problem I was talking about above:

- Qwen is near Sonnet 4.5!

- How do I run that?

- [Starts talking about something inferior that isn't near Sonnet 4.5].

It is this strange bait/switch discussion that happens over and over. Least of all because Sonnet has a 200K context window, and most of these ancdotes aren't for anywhere near that context size.

Kim_Bruning · 2026-04-15T13:27:40 1776259660

You're not wrong; but... imho it's closer to Sonnet 4.0 [1] on my personal benchmark [2]. And I HAVE run it at just over 200Ktoken context, it works, it's just a bit slow at that size. It's not great, but ... usable to me? I used Sonnet 4.0 over api for half a year or so before, after all.

Only way to know if your own criteria are now matched -or not yet- is to test it for yourself with your own benchmark or what have you.

And it does show a promising direction going forward: usable (to some) local models becoming efficient enough to run on consumer hardware.

[1] released mid-2025

[2] take with salt - only tests personal usability

+ Note that some benchmarks do show Qwen3.5-35B-A3B matching Sonnet 4.5 (released later last year); but I treat those with the same skepticism you do , clearly ;)

yencabulator · 2026-04-16T21:11:47 1776373907

One sure would expect Qwen3.5-35B-A3B to "perform slightly better" than Qwen3-235B-A22B!

zozbot234 · 2026-04-15T14:56:33 1776264993

> The hardware to actually run them is the bottleneck, and it is a HUGE financial/practical bottleneck.

That's unsurprising, seeing as inference for agentic coding is extremely context- and token-intensive compared to general chat. Especially if you want it to be fast enough for a real-time response, as opposed to just running coding tasks overnight in a batch and checking the results as they arrive. Maybe we should go back to viewing "coding" as a batch task, where you submit a "job" to be queued for the big iron and wait for the results.

Borealid · 2026-04-15T11:52:17 1776253937

A machine with 128GB of unified system RAM will run reasonable-fidelity quantizations (4-bit or more).

If you ever want to answer this type of question yourself, you can look at the size of the model files. Loading a model usually uses an amount of RAM around the size it occupies on disk, plus a few gigabytes for the context window.

Qwen3.5-122B-A10B is 120GB. Quantized to 4 bits it is ~70GB. You can run a 70GB model in 80GB of VRAM or 128GB of unified normal RAM.

Systems with that capability cost a small number of thousand USD to purchase new.

If you are willing to sacrifice some performance, you can take advantage of the model being a mixture-of-experts and use disk space to get by with less RAM/VRAM, but inference speed will suffer.

fy20 · 2026-04-15T12:48:46 1776257326

If you want something off the shelf get a MacBook Pro M5 (base "Pro" CPU) with 48GB RAM:

Gemma 4 31B Q6: 9tok/s, I'd say it is smarter than GPT-4o, but yeah it's slow. Good for coding.

Gemma 4 26B A4B Q4: 50tok/s. Feels faster than ChatGPT 5.4, but not as smart (as it reasons less). Good for general chatting and research.

fortyseven · 2026-04-15T20:03:16 1776283396

Give Gemma4 a look, too. I've had terrific results with that and OpenCode locally.

Kim_Bruning · 2026-04-15T11:08:07 1776251287

This is highly problematic.

I may consider showing my ID to a company I already have a business relationship with; given demonstrable legal obligations, contractual necessities, legitimate interests etc . Eg the standard GDPR list.

I do have an existing business relationship with Anthropic, so I might under some circumstances decide to show them my id. I don't have a business relationship with Persona though.

I understand the instinct: they want to insulate themselves from holding PII. Not the worst idea. I'm not happy with it being a third party though. Especially the third party in question.

3s · 2026-04-16T03:55:39 1776311739

But they already have PII on nearly all users. Many user upload documents with their name, or pictures of themselves, or have a chat where home addresses are involved. All of this is information anthropic already has on their users (voluntarily provided via chats or via api) and is equivalent to what Persona gets via their verification - it’s just more convenient to use a third party SaaS product for this than vibe coding their own identity verification platform I guess

Kim_Bruning · 2026-04-16T10:26:32 1776335192

This might be conflating two things. What data exists somewhere, and how many different independent parties hold it. It's not the same risk.

Put this way: I sort of already trust Anthropic with some of my PII. And that's ... maybe not ok actually. But it's a single failure surface.

But that's definitely not the same thing as trusting Anthropic, AND Persona AND All Persona's partners AND their Partners ad infinitum.

And let's say Persona is actually ok; who knows, they might be? But it's still an extra surface; and if they share again, that's another extra surface again.

It's fairly common sense blast radius minimization. This is part of the actual theory behind GDPR.

"We already seem to accidentally be leaking some data through channel A" , doesn't mean it's a good idea to open channels B-Z as well. It means you might want to tighten down that channel A.

Kim_Bruning · 2026-04-14T22:59:32 1776207572

Feeling is mutual, actually O:-)

Anthropomorphism and Anthropodenial are both variants of Anthropocentrism, and share the same limitations. Have you considered other axes of thought?

I can readily admit that lots of humans will naively anthropomorphize horrendously, but I think that:

- The eliza effect is not what people think it is

- What is actually going on is obscured by all the anthropomorphizing

- But this is yet no grounds to throw out the underlying phenomenon, especially when a) it can be useful and/or b) it causes people to get hurt.