I've been debating the idea of building tiers or layers of models to accomplish the same.
It very well could be that this go/no-go pre-processor is simply another ML model trained on a binary classification task. Stack a few of these and you can wind up with some interesting programming models.
This would also explain the ease at which ChatGPT gets rid of escapes/bad prompts - they have an additional layer that assesses whether the question could be, for example, racist, and then spits out a 'Sorry as a language-model I am not trained to answer this kind of question'. No need to retrain the main 14B transformer model.
I've been debating the idea of building tiers or layers of models to accomplish the same.
It very well could be that this go/no-go pre-processor is simply another ML model trained on a binary classification task. Stack a few of these and you can wind up with some interesting programming models.