> No, what I'm saying is you can't ban everything associated with Nazis and noth...

> No, what I'm saying is you can't ban everything associated with Nazis and nothing else in a LLM, because neural nets don't work like that and you're simply unable to ban something without influencing all the results. Which is worse than just banning info about Wernher von Braun.

You don't have to do it inside a single model, you can have a complex of models where one of them selects the almost-final output, and if it has Nazi references, raises an indicator which the system orchestrating the models recognizes and reprompts for a correction (if it is the first time) or returns a canned response (if a suitable response cannot be generated in enough tries.)

Probably still has some impact on other answers (because the detection layer is probably not 100% accurate, and if you want a near-zero miss rate on detection you probably have to accept some false positive rate), but you can get a lot closer than relying on a single pass through a single model.