Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think quantization is a red herring. If there's any way to undo the unlearning, this means that the knowledge is still in the weights -- that's basic information theory. I'm sure there are a million other ways to recover the lost knowledge that don't involve quantization.


I can see how quantization or down sampling itself could be a fundamental way to address this.

1. Train normal full precision model.

2. Quantize down until performance is borderline and then perform the unlearning process.

3. Train/convert/upsample back to FP for subsequent tuning iterations.

Seems like you can create an information bottleneck this way. The echos of the forgotten may have trouble fitting through something that narrow.


You're right that quantization isn't anything special here, but red herring isn't the right word, it's just "embarrassingly simple", per the title.


Okay, but narrowly focusing on a "quantization-robust unlearning strategy" as per the abstract might be a red herring, if that strategy doesn't incidentally also address other ways to undo the unlearning.


I think it's useful because many people consume quantized models (most models that fit in your laptop will be quantized and not because people want to uncensor or un-unlearn anything). If you're training a model it makes sense to make the unlearning at least robust to this very common procedure.

This reminds of this very interesting paper [1] that finds that it's fairly "easy" to uncensor a model (modify it's refusal thingy)

[1] https://www.reddit.com/r/LocalLLaMA/comments/1cerqd8/refusal...


Yeah, exactly this. You would really want to pursue orthogonal methods for robust unlearning, so that you can still use quantization to check that the other methods worked.


That’s like saying that encryption is a red herring. Yes, the information is there, but recovering it is a different matter. In this case, quantisation allows you to recover the information without knowing the “cypher” used to “forget” it - that’s the important distinction.


If there is any way to undo the unlearning, there is also a way to use that method to identify the weights carrying the information to stop them from conveying that information. At the heart of training is detection.

The information may still be in there, but undetectable by any known means. You can definitely certainly remove the information, setting every weight in the model to zero will do that. Identifying when you have achieved the goal of completely removing information while not destroying other information might not be possible.

I'm not sure if that will mean there might in the future be something analogous to zero-day unlearning reversal exploits.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: