You're getting modded down, but I think this is a valid concern, or at least one...

nottorp · on March 1, 2024

> If you try to ban everything associated with Nazis, or that was performed by a Nazi, you may accidentally block things that you didn't want to.

No, what I'm saying is you can't ban everything associated with Nazis and nothing else in a LLM, because neural nets don't work like that and you're simply unable to ban something without influencing all the results. Which is worse than just banning info about Wernher von Braun.

I may be wrong, considering my knowledge of neural networks is limited, but so far I got one downvote and no explanation...

dragonwriter · on March 2, 2024

> No, what I'm saying is you can't ban everything associated with Nazis and nothing else in a LLM, because neural nets don't work like that and you're simply unable to ban something without influencing all the results. Which is worse than just banning info about Wernher von Braun.

You don't have to do it inside a single model, you can have a complex of models where one of them selects the almost-final output, and if it has Nazi references, raises an indicator which the system orchestrating the models recognizes and reprompts for a correction (if it is the first time) or returns a canned response (if a suitable response cannot be generated in enough tries.)

Probably still has some impact on other answers (because the detection layer is probably not 100% accurate, and if you want a near-zero miss rate on detection you probably have to accept some false positive rate), but you can get a lot closer than relying on a single pass through a single model.

quickslowdown · on March 1, 2024

I think using a less charged example than "Nazis" would probably have helped avoid down voted & lack of engagement, but I understand why you chose it as an example & personally don't take issue with it, especially because you elaborated on why you picked them as an example. Just my $0.02

nottorp · on March 1, 2024

To be honest I thought it would be a less charged example than a modern topic :)

For example, your use of 'engagement' made a lot of associations in my mind, none of them positive.