Hacker News new | past | comments | ask | show | jobs | submit login
"Asswipe," replied Yahoo's server. That's when I knew I had it (ridiculousfish.com)
573 points by peteretep on April 18, 2013 | hide | past | favorite | 104 comments



This entry made me smile. First, the title with its language caught my eye. Then, as I reached the end of the line and saw the domain name, my tired and sleepy face broke into a full-bodied grin.

fish is my most-favorite tongue-in-cheek technical blogger ever. He doesn't blog too often, but they're almost always very useful (and very witty!) gems.

(Hint: you can find out who he really is by reading far back enough.)


or if you click the "about fish" on the top right of the kitchen roll.


It still doesn't tell you who it is on the about page though, just that he's on the AppKit team.


I would imagine email address corydoras@ridculousfish.com would give a hint.


So you're saying he is a fish. Interesting.

http://en.wikipedia.org/wiki/Corydoras


So, email corydoras@ridculousfish.com and he'll reply with a hint about his identity?


nah, you'll have to partion the pre @ part using a reg ex into 2 parts, and then take the one that makes the most sense. >c orydoras

>co rydoras

>cory doras

>coryd oras

>corydo ras

>corydor as

>corydora s

I'm going with coryd oras, sounds like a real name.


Why 2 parts?

Cory D Oras.

Obviously.


still haven't found it, but this is a pretty cool blog

http://ridiculousfish.com/blog/posts/my-one-lame-steve-story...


:waves bye

Linux, FreeBSD, Solaris:1 and Hackers' Lounge:2 were the first major time sinks of my digital life in the late 90s and early 2000s. ychat and ymsg were my intro to network protocols. I learned Makefiles, C, C++, and Python so I could build, understand, and modify programs like curfloo, curphoo, zinc, and gchat+/gyach. Although I moved on to IRC, I'll never forget when someone passed me "Smashing the Stack for Fun and Profit" (and the implicit link to Phrack) or the first time a bot I created logged in and took drink orders. The pride the first time I wrote a patch for my client to handle a login change all by myself.

A lifetime later, I know I wouldn't be where I am as a programmer if not for the encouragement of friends (some of whom I still talk with and do business with) I made as a teenager on yahoo chat. A lot of people I know grew up on IRC, but Yahoo Chat was my first love. Good bye, I'll miss you!


I had a really similar experience. As a bored teenager in the late 90s, I would sometimes log into the Yahoo chat rooms using the computers at my high school. The chat client was a Java applet, and it was atrociously slow and prone to crashing. I had started teaching myself to program a few years earlier, so creating a chat client seemed like a fun project.

There was a guy that hung out in the Programming:1 chat room with the chat handle was 127001 (he was able to create that handle before Yahoo became more strict with the chat handles one could create) that had reverse-engineered the protocol and posted a guide online. I wonder what happened to him. Thanks, loopy!

Anyway, I created a Yahoo chat client in Win32/C as I was learning the API and the language. I was the only user for quite a while, but eventually I posted it using the free web space EarthLink gave me for being a dial-up subscriber. It got pretty popular through word of mouth. Later, when I installed Linux for the first time, I took that code and turned it into an ncurses-based client. It was a lot of fun, and somehow I ended up turning the fun I had hacking on little projects like that into a living.


> or the first time a bot I created logged in and took drink orders.

My first experience with programming was writing bots in the mIRC scripting language. After first getting things to work the feeling was truly empowering. I have to say though, my next attempt in programming was in C and I found it extraordinarily difficult to get started in. Having said that, I was 14 at the time.

I think the fact that my first programming experience was with a scripting language really affected how I how I think about programming now. Ever since then I've found it easier to learn scripting languages than systems languages. Maybe that doesn't make sense to some but that's just me.


At Microsoft, I once wrote code for a public facing login system where you had to make sure usernames didn't contain some banned words. I was surprised by how many words on the list were new to me and how, err, creative people get.

I always wondered who at Microsoft was tasked with keeping that list 'up to date'.


These lists can be very locale insensitive. A few years back my friend was trying to get a Gmail id with his name. To his surprise he wasnt able to get any id, no matter how long and obscure suffix he tried. Then it hit me that it was because of his name Kshitij. Systems should take terms from other languages and locales as white lists in these cases.


This is known as "The Scunthorpe problem". http://en.wikipedia.org/wiki/Scunthorpe_problem

(The substring part of it, not the locale-sensitive part.)


The world filter on the PS3 chat is my favourite example of this. It was almost impossible to chat with friends during gaming session when for example "Akuma" (A Street Fighter character and a perfectly safe word in Japanese as well) gets censored into "a---a", and Several Finnish words like "pelata" (to play. "pela" is something offensive I guess?) "mutta" ('but'... apparently it considers "mutt" a bad word.) and anything containing "ass" (ssa is the inessive suffix and Finnish has vowel harmony. Fun.) and lots of others get completely mutilated.


A clbuttic problem of naive sub-word filtering.


>These lists can be very locale insensitive.

Perhaps because english is all it matters to them. Who in Europe, for example, even cares if a forum or chat has some swear words in his language?

We might be even more pissed off that we can't swear and leave the service, rather than complain about it.


Analogous to that, in Germany don't beep out any swearwords on television, which is why American celebrities appearing on TV here swear vedammt oft. For example this eminem interview

http://www.dailymotion.com/video/x98y4u_eminem-interview-on-...

It's at 2:30 where he talks about it. And swears a "little".


That brightened my day considerably.

When I watch Misfits (UK show) they say some things that I never hear on American shows. It only happens on shows that are after the watershed[1] though. Before then I don't know which parts are censored.

It always amuses me a little how Americans seem to be more strict than the Europeans for media censoring.

1. http://en.wikipedia.org/wiki/Watershed_(television)


Yes. And nobody flinched much when this happened in France, and a few decades ago at that:

http://www.youtube.com/watch?v=_uxAofXJCh8


Until someone mentions Nazis, that is.


Since this has been downvoted, allow me to explain. The parent to my previous comment implies that Europeans are more tolerant of swearing than Americans, but they (or at least their governments) verge on paranoia when it comes to any mention of Hitler or the Third Reich.


No, there's nothing against mentioning Hitler or the Third Reich. As it's well known in the Web, full movies about Hitler and the Third Reich have been made even in Germany, and they weren't banned or its producers arrested.


The town of Cumming GA is subject to some interesting objections. Entering the address, one system refused with the comment "My, what colorful language you have."


Bring to mind the classic Fark example of "a bit chilly". Fark's swear filter removes spaces to hunt for disallowed words...


And the service is named fark??


Microsoft uses a tool called Policheck that maintains different wordlists for different locales

There's a blog post on it here: http://blogs.msdn.com/b/michkap/archive/2010/12/04/10100294....


It's not just swear-words: A credit card processor I used to work with wouldn't accept a payment from Mr. Echo because they regarded it as an attempt to hack them.


it's not online only either, I had university classes with a guy whose family name sounds like a swear word, and since to attend exams you had to preregister on a big dead-tree book left alone, usually teachers assumed it was a joke and didn't account for him.


At my fathers server farm, they had a scottish colleague called "Ronald McDonald". That regularly led to hotlines hanging up when he tried to order replacement parts...


"I'm terribly sorry Mr. Fukker that we didn't put you on the official list, this oversight will never happen again" "Ohh, and how often it happens.


A friend of mine has the wonderful surname of "Null," which caused some fun at the telephone company billing department back in the 80's.



One of the counselors at my high school had the surname 'Coon'.

As a result, the father of one of her students was unable to receive emails from her at his work address, due to their filters.

I'm not really sure how (or if) they solved that problem.


Oh man, when I worked there in Localisation for Office, there were crazy length lists for every language, it was so much fun to read through with foreign friends!


There are opensource dictionaries, timezone DBs, geo databases, etc. I wonder if there are also opensource multilingual swear word lists.


Not multilingual, but open source, is the lovely collection of lists included with Dan's Guardian (In configs\lists\phraselists in the source, available from http://sourceforge.net/projects/dansguardian/ )

Maybe someone should ask Dan if we can stick them in plain text somewhere easily accessible for future projects to reference. I know we don't really need to ask, but it's polite.

We could then let the moderators of certain subreddits, or 4chan, have commit access to add new, creative terms of profanity!

I think this would be a humorous list to have around, if only to find out what people are being offended by these days. I have a feeling that, for every term, phrase or word you care to quote, there'll be someone who's offended by it.

Which brings me to the point that there has been a fair bit of debate over the use of profanity filters (a few good links at http://stackoverflow.com/questions/273516/how-do-you-impleme...) and how effective they are. One of the references in the link above is for a 14yo circumventing a profanity filter (based on a white-list) with the phrase "I want to stick my long-necked Giraffe up your fluffy white bunny."


I think this would be a humorous list to have around, if only to find out what people are being offended by these days.

I run a slang dictionary website where visitors can score how offensive they find the terms. The 100 most offensive are listed here:

http://onlineslangdictionary.com/lists/most-vulgar-words/

It's not multilingual and not open source, but it does capture some of the "state of the art" of dirty words.


That's brilliant! Are you sure you wouldn't consider providing an updated 'top 100' list as a service to anyone who felt they could use them in some hitherto-unknown way?

Also, hope it's not a sore point, but are Google still being unreasonable about the citations on your site? If so, is there anyone from Google reading this that can have a look into this for him? It seems a bit unfair (to say the least) that a dictionary is penalized for citing sources, surely that's just responsible editorial! (Link with some info if you're interested http://onlineslangdictionary.com/pages/google-panda-penalty/)


That's brilliant!

Thanks!

Are you sure you wouldn't consider providing an updated 'top 100' list as a service to anyone who felt they could use them in some hitherto-unknown way?

I'd absolutely love to. But with Google penalizing the site for the majority of the past 2 years, I've become extra-sensitive about content on my site being available anywhere else on the web. In another world, I'd be ecstatic any time I came across material sourced from the site. But as it stands, I've given some thought to filing my first DMCA requests - thus becoming part of that chilling effect that gave chillingeffects.org its name.

I have put the data to some good use. http://www.offensivest.com/books/ ranks English works in the Project Gutenberg corpus by vulgarity. The site desperately needs some TLC: at the least, tweaks to the methodology and a page explaining what that methodology is.

I have more ideas for using the data, but I spend 90% of my time trying to get rid of the penalties.

So...

Also, hope it's not a sore point, but are Google still being unreasonable about the citations on your site?

Yes, very much so.


I'm sorry to hear that. I hope someone with some influence realizes the silliness (not to belittle the situation) of this whole affair. I presume sites like the Urban Dictionary (http://www.urbandictionary.com/) get away scott free by not providing any source links at all!

You have my promise, at least, that I won't reproduce any of your work (until you deem it good to go) except in the form of drunken pub factoids :-)

I love, for example, that the complete works of William Shakespeare is currently number 4 on the most vulgar books list!


It's certainly interesting as a list of sex slang but the ranking doesn't make much sense. A lot of these are ridiculous phrases and somehow "come down" as a simple euphemism is rated worse than a bunch of variants of the F word. (or "come down" as in drugs losing effect but that's even more baffling to be one of the most offensive words)


Fascinating. I knew there was a lot of hate for the BBC, but I was still surprised - and then I clicked on it.


Please, no. Dumb people are already dumb enough without another list to refer to. My city recently renamed a street because somebody who worked at a company on that street found "Morning Glory" (a common enough flower) in Urban Dictionary.


I've always considered "money shot" to be some gambling term until one of the topics here on HN had me look it up in Urban Dictionary. I think UD is a bit like a medical textbook - sometimes it's better for you not to look into it too much, if you're not a professional.


It really would be the best repo. I can see the commit logs being the stuff of legends!

I don't see much use in profanity filters on the net these days, but it is definitely useful for businesses working with external teams just to sanity check content before publishing :)


Yeah, but then again I love occasionally coming across the rant of some angry dev in source comments (see http://www.vidarholen.net/contents/wordcount/ for an analysis of profanity in the linux kernel source).

Compare that with the source of Win2k (http://www.kuro5hin.org/story/2004/2/15/71552/7795) where there were apparently "a dozen or so "fucks" and "shits", and hundreds of "craps"."

It seems greater stability is achieved when devs are allowed to express themselves in comments.


Oh yes, in source & commits it's fair game! I'm talking more about editorial content that might have been outsourced. E.g. how-to articles for a company product or articles written in-house that are localised by an external vendor. These things need to be quite clean!


If we do, its nice to know we've got a modern day Rosetta Stone just waiting for those alien anthropologists.


We had a customer once who wanted order IDs to be random four-digit strings.

We had to put in some filtering after a customer's order ID ended up being something along the lines of CRAP or SHIT or something.


Banned words like sffcei? (Which was a real banned word on MSN)


I have to ask - Googling has further infuriated me.


Complete list of forbidden words on MSN

http://www.freewebs.com/chief/forbiddenwordssecurity.htm


But what does "sffcei" mean? <:)


It's possible that they used a Bloom filter to compactly represent the list of banned words. This would allow them to share the filter without explicitly sending the list of banned words, so you'd never see a "foul-mouthed" network packet. But it could also return false positives for random strings, like perhaps "sffcei".

http://en.wikipedia.org/wiki/Bloom_filters


Nothing. http://www.urbandictionary.com/define.php?term=sffcei

If UD doesn't have it, I'm convinced nobody has ever used the word to mean anything dirty.


How did it treat more 'ambiguous', context-dependent offensive names such as, to take a real example I saw, "Realist88"?


How is this offensive?

I know that "88" is used as a surrogate for "Heil Hitler" in neo nazi circles (H is the 8th letter of the alphabet), but "Realist" doesn't ring any bell.


You answered your own question. It's offensive because of '88'.


Wow, I see lots of people born in 1988 using the screename (name)88, and I don't think any knows that :P

I hope 81 isn't some kind of code either :)


As long as you don't combine it with other words you should be fine. In my example, Realist referred to "racial realism". I chose a relatively subtle example because many of them I find repulsive to even type, but a less subtle username I've seen about the place is "ChuckSpears". That's the kind of thing I'm talking about.


That one also seems completely fine by itself? And it's not like they generally block names that advertise you being a bad person.


ChuckSpears is a direct reference to a racist insult, "Spear Chucker". They don't generally block names like this but I often wonder where the line is.

In human-moderated systems like XBox Live I've seen some relatively 'sophisticated' offensive names get called out and banned by the moderators.

I was wondering how username-blacklist systems worked, and how they deal with names which are obviously offensive to anyone who's dealt with the likes of Stormfront et al. before, but could theoretically be chosen by a totally innocent unfortunate.


> In human-moderated systems like XBox Live I've seen some relatively 'sophisticated' offensive names get called out and banned by the moderators.

For anyone who wants examples, reading a couple of pages of http://whywasibanned.com/category/gamertag/ should give you some examples


I see what you're getting at and agree it could be offensive but it could also be Charles Spears. At the same time someone could name an account CharlesSpears and probably no one would find it offensive.

It is interesting to think about where the line should be drawn. Especially in human moderated cases.


Yeah I figured out what the insult was, I just have no idea what group it is applied too. If you need to know about it beforehand it's not a very strong insult.


Not only "88" but "Realist" is used as shorthand for "Racial Realist", a common euphemism contemporary racists use.


I can't answer your question, but in this example, it's the context, not that name, that is offensive. I doubt it's even linguistically possible to make a reliable classifier.


If you're going to spam the system, why not just use Win32 API's and control their client like a zombie instead of trying to play the cat and mouse game reverse engineering their protocol?

You can send instructions to manipulate the client and have it do what you want almost as easily as you could if you knew their protocol. And that's without all the cat and mouse headaches.


I used to write bots for Yahoo's (and a few others) chat rooms back in the mid 90s. Nothing malicious, I was just a bored teenager so we're talking pretty dumb stuff like spamming naughty words in public chat rooms, etc. Anyhow, my point was this: the bots I wrote did just what you described, I used Win32 APIs to control the chat clients. As you said, it took only a few seconds to bang out the code and the bot could work on against multiple different chat clients with very little additional work.

The only time I've every bothered to write my own chat protocol client was for IRC in the late 90s (at which point I'd left college, was working full time so had turned my programming skills to good) and the only reason I bothered to write my own IRC client was because I was utterly fed up with the quality of Windows clients.

Back in the old days, Win32 APIs were so insecure that you could have all sorts of fun with them. I'm not sure how things stand these days though; my development these days are almost exclusively Linux and UNIX based.


The whole point was to make a Mac client, not to "spam the system."


I think he's asking how it got so obfuscated in the first place. Seemingly, the spammers could just pilot a standard up to date client with a bot, rather than try and figure out the protocol.

I'm sure it's more complicated than that however, like client side rate limiting or something.


That's exactly what I was referring to.

I bet they would put the rate limit in the client, since they did that with the banned word list, but rate limiting would make more sense at the server. Maybe they did both.

Now you could fire up a new VM for each client or use a botnet to do your bidding. Oh, how the Internet has advanced.


Ah, fair enough.


I'm not familiar with Win32 APIs. Could you elaborate more as to how they would enable you to control the client like a zombie?


You can basically read anything from controls, and trigger callbacks at will, as if you had actually clicked a button or written some text. This means that you can write "expect-like" software -- just start up the program, and have another program read input from it's text fields and issue commands to it.

I have actually done a lot of this, putting old sourceless win32 and win16 programs run in the background on virtual machines on the server and building new web-based interfaces on top of them.


Actually it ranges from simple event spoofing (user clicked here, user dragged there) to injecting a DLL + spawning a thread under your control.

Event spoofing is pretty limited. While having a thread under your control gives you full power as you have full access to the process' memory and can call any function you want.


AutoIt is a popular software to automate GUI applications.

http://www.autoitscript.com/site/autoit/


Boston Workstation[0] is another, that I used quite a bit at work for a while. Pretty powerful, although I have no more desire to touch VB ever again.

[0]: http://www.bostonsoftwaresystems.com/



I think he means the APIs that allow you to simulate mouseclicks and keyboard input on arbitrary windows and controls


That's what I meant to use it for but the API is used at the root of applications to draw windows, handle mouse click events, accept keyboard input, create icons in the system tray and anything else that would involve Windows UI.

In the same way applications use the win API to create their UI, others could use it to manipulate and control the interface of other programs. It's powerful.


What a charming and informative website! So refreshing to see an original design instead of another Wordpress, etc. site. Check out this fascinating note on how grep manages to be so fast by being secretly slow:

http://ridiculousfish.com/blog/posts/old-age-and-treachery.h...


I had to work on implementing fairly well-known advertiser's banned-words list into our system recently. Their system is incredibly stupid. For example, they'll ban the word "Dimethltryptamine", but "Dimethyltryptamine" (the correct spelling) is allowed. You're not allowed to use the word "bulldog" under the Pets category. You can't use the word "dragon" under Jobs. It seems like they just have a form that visitors to the site can use to link ads containing "offensive" words and the system will automatically add it to a list.


Awesome! Network sniffing and analyzing is one of my favorite hobbies since being introduced to it as a security analyst for an MSSP and then writing Wireshark plugins at Sandia. If anyone is looking for a fun Saturday, I highly recommend http://forensicscontest.com/.



I think the .nyud.net has to go straight after the domain name: http://ridiculousfish.com.nyud.net/blog/posts/YahooChatRooms...


That it does.

Visiting http://example.com/index.html.nyud.net would send a request to "example.com" for the path "/index.html.nyud.net". That's not what you want.


Thanks for pointing this out, I had really never noticed that before. I feel slightly silly now -- obviously the nyud.net was just being passed as in the query string...


Yup!

You can find out more about CoralCDN at http://www.coralcdn.org/


Der, why did they filter naughty words in the client?


Perhaps they did filter messages on the server as well. The client side word-list could be used for display purposes only, so if the user typed a censored word, the client could update the display with that word redacted and without waiting for a round-trip to the server.


They didn't -- the list was sent from the server, because "this list might need to be updated dynamically, in case someone on the Internet managed to think up a new word for sex."


The list sent was supposed to be used by the client to filter. What I think the OP is suggesting is that they should have just filtered server side not client side.


The list was maintained on the server, but filtering was done in the client. That's why it had to be sent to the client on every log in.


Yeah but couldn't they just do the filtering in the server itself? Just like the Zbody hack?


I suppose British people never really used the chat to talk about paedophiles. At least they looked for actual words rather than any occurrence in the string.


We'd also spell it "pædophile" just to make sure you understand Unicode...


Obligatory SNL reference:

http://www.hulu.com/watch/285711


I'm surprised that "phag" is blocked but "fag" isn't.


and all this time I thought "shat" was being polite


Not sure about elsewhere in the world but in the UK "shat" is a sort-of fun past tense for "shit", e.g. "Ian shat in his pants"


It's not used very often in the US, but it's still a perfectly understood term.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: