I think quantization is a red herring. If there's *any* way to undo the unlearni...

bob1029 · 2024-11-04T10:22:10 1730715730

I can see how quantization or down sampling itself could be a fundamental way to address this.

1. Train normal full precision model.

2. Quantize down until performance is borderline and then perform the unlearning process.

3. Train/convert/upsample back to FP for subsequent tuning iterations.

Seems like you can create an information bottleneck this way. The echos of the forgotten may have trouble fitting through something that narrow.

kyle-rb · 2024-11-04T14:14:38 1730729678

You're right that quantization isn't anything special here, but red herring isn't the right word, it's just "embarrassingly simple", per the title.

codeflo · 2024-11-04T15:25:28 1730733928

Okay, but narrowly focusing on a "quantization-robust unlearning strategy" as per the abstract might be a red herring, if that strategy doesn't incidentally also address other ways to undo the unlearning.

fjdjshsh · 2024-11-05T16:41:15 1730824875

I think it's useful because many people consume quantized models (most models that fit in your laptop will be quantized and not because people want to uncensor or un-unlearn anything). If you're training a model it makes sense to make the unlearning at least robust to this very common procedure.

This reminds of this very interesting paper [1] that finds that it's fairly "easy" to uncensor a model (modify it's refusal thingy)

[1] https://www.reddit.com/r/LocalLLaMA/comments/1cerqd8/refusal...

sdenton4 · 2024-11-05T05:30:03 1730784603

Yeah, exactly this. You would really want to pursue orthogonal methods for robust unlearning, so that you can still use quantization to check that the other methods worked.

truculent · 2024-11-04T13:37:03 1730727423

That’s like saying that encryption is a red herring. Yes, the information is there, but recovering it is a different matter. In this case, quantisation allows you to recover the information without knowing the “cypher” used to “forget” it - that’s the important distinction.

Lerc · 2024-11-04T12:54:37 1730724877

If there is any way to undo the unlearning, there is also a way to use that method to identify the weights carrying the information to stop them from conveying that information. At the heart of training is detection.

The information may still be in there, but undetectable by any known means. You can definitely certainly remove the information, setting every weight in the model to zero will do that. Identifying when you have achieved the goal of completely removing information while not destroying other information might not be possible.

I'm not sure if that will mean there might in the future be something analogous to zero-day unlearning reversal exploits.