More

abeppu · 2026-02-24T05:06:46 1771909606

I feel like most participants in the thread are on the same page about limiting openclaw's access to anything that matters.

But I wonder what things these people approve for Claude code and it's equivalents? Where's the line?

abeppu · 2026-02-23T15:37:17 1771861037

I don't know what I'm talking about, but isn't the wavelength of the laser pretty limiting to the idea of just slapping a Bayer color filter on? Like, if the laser is IR (partly so they're not visually disrupting all the humans around them), the signal you get back doesn't the visual spectrum sections that you'd need to get RGB right?

abeppu · 2026-02-21T17:34:19 1771695259

I mean, in this case the government spent thousands because there was a small amount of circumstantial evidence that suggested there was clandestine communication happening during wartime.

What was the immediate government spending on Japanese American internment, where there was no evidence or investigation into the ~120k people whose lives were disrupted, and who were transported, housed, fed and guarded for multiple years?

Arguably, spending thousands on investigating something specific is less wasteful than the alternatives the government was willing to take at that time.

abeppu · 2026-02-20T15:53:00 1771602780

I think maybe there are subsets of problems where you can have either a human or a smart LLM write a verifier (e.g. a property-based test?) and a performance measurement and let the dumb models generate candidates iterate on candidates?

stavros · 2026-02-20T16:27:02 1771604822

Yeah, maybe, but then it would make much more sense to run a big model than hope one of the small ones randomly stumbles upon the solution, just because the possibility space is so much larger than the number of dumb LLMs you can run.

abeppu · 2026-02-20T16:59:45 1771606785

I don't work this way, so this is all a hypothetical to me, but the possibility space is larger than _any_ model can handle; models are effectively applying a really complex prior over a giant combinatorial space. I think the idea behind a swarm of small models (probably with higher temperature?) on a well-defined problem is akin to e.g. multi-chain MCMC.

abeppu · 2026-02-20T15:49:40 1771602580

Diffusion model papers are always interesting to read but I always feel like they need some mechanism to insert or delete tokens. In the example in the figure in this post, once it has fixed "British munchkin cats _ _ and ..." you _can't_ get to "British munchkin cats are a new and controversial breed." because there's not the right number of tokens between "cats" and "and". In a coding context, if your model samples a paren or a comma or something which is entirely plausible at that position, it can still close off an expansion which would be syntactically correct.

kazinator · 2026-02-20T20:15:14 1771618514

OK, but then, in this regard, left to right generation is hardy better:

Once you get to "British cats <next-token-here>" you can't get to "British munchkin cats <next-token-here>"; the tokens to the left are done and dusted.

It's kind of a feature. Diffusion is used for images, right? It's like saying, once the image of a door has started to form right next to a kitchen counter, it cannot insert a refrigerator there any more. Well, maybe it doesn't "want to" because that layout is already settled by that time.

cubefox · 2026-02-23T11:32:29 1771846349

There is a new way to train diffusion models to insert tokens between existing tokens rather than unmasking <mask> tokens: https://openreview.net/forum?id=VbvXjs5f72

However, I believe this would "only" be able to insert tokens, not to delete tokens again it mistakenly produced before. (The deletion in the title refers to the reverse process during training, where tokens are progressively deleted rather than masked.)

crystal_revenge · 2026-02-20T17:57:39 1771610259

But the "infilling" problem isn't exactly solved for AR LLMs, so it's a strange critique.

Further more, you're applying the logic of AR LLMs to diffusion models. AR LLMs are only seeking the probability of the next token (a chain of conditional probability), but diffusion LLMs are modeling the probability of the entire output at once. Because of this token structures that leads to invalid outputs should be extremely low probability if properly trained.

LarsDu88 · 2026-02-20T17:59:21 1771610361

This blogpost references block diffusion which fixes this issue that you are describing.

abeppu · 2026-02-20T22:08:16 1771625296

The cat example is from the section on their block-causal attention mask. I really don't think this fixes the issue. So far as I can see, the block schedule dictates when they sample at each position. It does _not_ change that they basically have an array-of-token-vars representation, and once `t_i` is sampled, nothing can "move" that value left or right.

moralestapia · 2026-02-20T16:58:58 1771606738

I think that having an early draft of the output is part of the appeal of this type of models.

abeppu · 2026-02-20T17:08:19 1771607299

Early draft yes. But when you write an early draft of prose or code, you leave yourself the ability to insert or remove material in a way that _changes the indexes of the tokens you already put in your draft_. If you write a letter, you may know that it ends with "Yours Truly, <your name>", but not know the absolute number of tokens the letter will use. In this framework, once you say that "Yours Truly, John Hancock" are tokens 501 to 506, infilling the preceding sentences requires that you exactly preserve the number of tokens before that point ... which to me seems silly. I'm sure it's computationally messy to be able to slide stuff around, but if it meaningfully changes the topology of the search process, it may be worth it.

naasking · 2026-02-20T16:50:15 1771606215

IIRC, some researchers are working on mixed AR+diffusion models for this sort of thing.

abeppu · 2026-02-20T17:26:33 1771608393

I think the gap is, if they're building hybrids with _forward_ AR and diffusion, they risk giving up the cool part of diffusion which is reasoning back. I may be imposing unreasonable human biases on to this, but I really think it would be interesting to have the model engage with the structure of the text, rather than just being either a sequence or an array of tokens. E.g. "I'm going to _ tomorrow." If the _ is not just a token but an expansion in context, which might be a noun phrase, a verb phrase etc, it could be filled in with "the mall", "practice guitar". In code "if (_1) { return _2; }", _1 could be an expression whose type is bool, and which makes sense as a check to confirm that some process is finished. I don't care specifically how many tokens either of those is, but I do care that it makes sense in context.

naasking · 2026-02-20T21:43:23 1771623803

I was thining of something like LLaDa that uses a Transformer to predict forward masked tokens:

https://arxiv.org/abs/2502.09992

abeppu · 2026-02-20T15:06:06 1771599966

I do sometimes wonder -- if the transformers paper wasn't published, what would the industry be like? Would the same ideas have been put together in almost the same way weeks or months later somewhere else?

abeppu · 2026-02-20T03:37:39 1771558659

I don't speak Mandarin but is this not an issue of style rather than the language itself? English can be courtly or poetic or abstruse but that's a matter of the speaker making a bunch of choices. I can't help but think of "Yes Minister" and Humphrey Appleby working quite skillfully to communicate in a way that ensured he would not be understood. Do Mandarin speakers not also have such a range of choices to be clear or not?

RestartKernel · 2026-02-20T04:02:49 1771560169

Maybe it's a matter of code switching? I've read that some Japanese teams prefer English for practical reasons, since a shared second language prevents anyone from getting bogged down in formalities. That is not to say Japanese is unable to be formulated with just as much precision.

abeppu · 2026-02-18T20:41:48 1771447308

So, just from the contents ... does anything make this especially different from other discrete math books?

abeppu · 2026-02-18T16:51:20 1771433480

I think the other moat is access to non-public data. If you can train, measure, or make decisions based on specific data that the vibecoder trying to clone you can't get, you can keep ahead.

abeppu · 2026-02-17T17:42:04 1771350124

But isn't part of the point of this that you want people who are eager to learn about AI and how to use it responsibly? You probably shouldn't want employees who, in their rush to automate tasks or ship AI powered features, will expose secrets, credentials, PII etc. You want people who can use AI to be highly productive without being a liability risk.

And even if you're not in a position to hire all of those people, perhaps you can sell to some of them.

EGreg · 2026-02-17T21:58:00 1771365480

Honestly, it seems worse than web3. Yes, companies throw up their hands and say "well, yeah the original inventors are probably right, our safety teams quit en masse or we fired them, the world's probably gonna go to shit, but hey there's nothing we can do about it, and maybe it'll all turn out ok!" And then hire the guy who vibecoded the clawdbot so people can download whatever trojan malware they can onto their computers.

I've seen Twitter threads where people literally celebrate that they can remove RLHF from models and then download arbitrary code and run it on their computers. I am not kidding when I say this is going to end up far worse than web3 rugpulls. At least there, you could only lose the magic crypto money you put in. Here, you can not even participate and still be pwned by a swarm of bots. For example it's trivially easy to do reputational destruction at scale, as an advanced persistent threat. Just choose your favorite politician and see how quickly they start trying to ban it. This is just one bot: https://www.reddit.com/r/technology/comments/1r39upr/an_ai_a...