Not multilingual, but open source, is the lovely collection of lists included with Dan's Guardian (In configs\lists\phraselists in the source, available from http://sourceforge.net/projects/dansguardian/ )
Maybe someone should ask Dan if we can stick them in plain text somewhere easily accessible for future projects to reference. I know we don't really need to ask, but it's polite.
We could then let the moderators of certain subreddits, or 4chan, have commit access to add new, creative terms of profanity!
I think this would be a humorous list to have around, if only to find out what people are being offended by these days. I have a feeling that, for every term, phrase or word you care to quote, there'll be someone who's offended by it.
Which brings me to the point that there has been a fair bit of debate over the use of profanity filters (a few good links at http://stackoverflow.com/questions/273516/how-do-you-impleme...) and how effective they are. One of the references in the link above is for a 14yo circumventing a profanity filter (based on a white-list) with the phrase "I want to stick my long-necked Giraffe up your fluffy white bunny."
That's brilliant! Are you sure you wouldn't consider providing an updated 'top 100' list as a service to anyone who felt they could use them in some hitherto-unknown way?
Also, hope it's not a sore point, but are Google still being unreasonable about the citations on your site? If so, is there anyone from Google reading this that can have a look into this for him? It seems a bit unfair (to say the least) that a dictionary is penalized for citing sources, surely that's just responsible editorial! (Link with some info if you're interested http://onlineslangdictionary.com/pages/google-panda-penalty/)
Are you sure you wouldn't consider providing an updated 'top 100' list as a service to anyone who felt they could use them in some hitherto-unknown way?
I'd absolutely love to. But with Google penalizing the site for the majority of the past 2 years, I've become extra-sensitive about content on my site being available anywhere else on the web. In another world, I'd be ecstatic any time I came across material sourced from the site. But as it stands, I've given some thought to filing my first DMCA requests - thus becoming part of that chilling effect that gave chillingeffects.org its name.
I have put the data to some good use. http://www.offensivest.com/books/ ranks English works in the Project Gutenberg corpus by vulgarity. The site desperately needs some TLC: at the least, tweaks to the methodology and a page explaining what that methodology is.
I have more ideas for using the data, but I spend 90% of my time trying to get rid of the penalties.
So...
Also, hope it's not a sore point, but are Google still being unreasonable about the citations on your site?
I'm sorry to hear that. I hope someone with some influence realizes the silliness (not to belittle the situation) of this whole affair. I presume sites like the Urban Dictionary (http://www.urbandictionary.com/) get away scott free by not providing any source links at all!
You have my promise, at least, that I won't reproduce any of your work (until you deem it good to go) except in the form of drunken pub factoids :-)
I love, for example, that the complete works of William Shakespeare is currently number 4 on the most vulgar books list!
It's certainly interesting as a list of sex slang but the ranking doesn't make much sense. A lot of these are ridiculous phrases and somehow "come down" as a simple euphemism is rated worse than a bunch of variants of the F word. (or "come down" as in drugs losing effect but that's even more baffling to be one of the most offensive words)
Please, no. Dumb people are already dumb enough without another list to refer to. My city recently renamed a street because somebody who worked at a company on that street found "Morning Glory" (a common enough flower) in Urban Dictionary.
I've always considered "money shot" to be some gambling term until one of the topics here on HN had me look it up in Urban Dictionary. I think UD is a bit like a medical textbook - sometimes it's better for you not to look into it too much, if you're not a professional.
It really would be the best repo. I can see the commit logs being the stuff of legends!
I don't see much use in profanity filters on the net these days, but it is definitely useful for businesses working with external teams just to sanity check content before publishing :)
Yeah, but then again I love occasionally coming across the rant of some angry dev in source comments (see http://www.vidarholen.net/contents/wordcount/ for an analysis of profanity in the linux kernel source).
Oh yes, in source & commits it's fair game! I'm talking more about editorial content that might have been outsourced. E.g. how-to articles for a company product or articles written in-house that are localised by an external vendor. These things need to be quite clean!