Training classifiers can also go off the rails under adversarial attack. This co...

Training classifiers can also go off the rails under adversarial attack. This commonly showed up in our systems when people sent short emails that were more ambiguous. For example this tends to cause problems where malevolent users adopt dogwhistles co-opting the language of the attacked group. The attacked group commonly becomes the ones getting banned/blocked in these cases