Wow, there was a thread today about moderating user generated content, a HN comment told how users try to get away using symbols from other languages that look like English to post offensive content. I was wondering if one could make an ML system that takes into account the appearance of words. And now I find your post, which is doing something like that, albeit with regexes. Interesting!