IBM's Watson Memorized 'Urban Dictionary,' Then His Overlords Had to Delete It

edw519 · on Jan 10, 2013

Nothing that Watson learned from the Urban Dictionary could possibly be any dirtier than what I hear from enterprise people all the time:

"We use our deep subject matter expertise to deliver value through actionable advice that enables our clients to harness the power of best practices in order to shift their paradigms and achieve 10X deltas against competitive industry metrics."

callmeed · on Jan 10, 2013

That's in Urban Dictionary too ...

http://www.urbandictionary.com/define.php?term=synergasm

;)

benohear · on Jan 10, 2013

The worst thing is that translated into normal speak that's actually a reasonable sounding proposition:

"We use our experience in this field to provide practical advice to our clients which helps them improve their way of doing business and ROFL-stomp their competition 10x over."

dasil003 · on Jan 10, 2013

Is roflstomp just an everyday verb now?

toomuchtodo · on Jan 10, 2013

Apparently so: http://www.urbandictionary.com/define.php?term=roflstomp

snogglethorpe · on Jan 10, 2013

The meaning of roflstomp seems obvious enough... but how to pronounce it... (rhyming with "waffle stomp"?)

dmoney · on Jan 11, 2013

I've heard it pronounced as rhyming with woeful-stomp.

CapitalistCartr · on Jan 10, 2013

ROFL-stomp? That's a new one on me.

saraid216 · on Jan 10, 2013

It's like curb-stomping, except with schadenfreude.

bguthrie · on Jan 10, 2013

Eh. Every field has its jargon. Techies who mock business lingo live in exceedingly fragile glass houses.

gnaritas · on Jan 10, 2013

Techie jargon is generally meant to enhance communication, while biz jargon is meant to baffle the listener with bullshit. Huge difference.

stingraycharles · on Jan 10, 2013

I think you're confusing biz jargon with marketing. Tech jargon suffers from marketing too; it's just that we're used to so much more "meaningful" jargon that we see that the bullshit ratio is actually pretty low. The same can be said about business jargon, it's just that we don't hear the meaningful jargon a lot. Want to guess how much meaningful tech jargon the average MBA hears? Probably not a lot more than we hear about his world.

gcr · on Jan 11, 2013

I don't consider confusing the potential customer to be honest marketing.

darkxanthos · on Jan 11, 2013

A great example is "cloud" or "web scale" etc.

blhack · on Jan 11, 2013

Web scale is a bizdev marketing term.

Cloud means stored online instead of locally.

pinchyfingers · on Jan 11, 2013

"growth hacking"

path411 · on Jan 14, 2013

These actually have true meanings as Tech jargon. Biz/Marketing love to abuse buzzwords though.

unimpressive · on Jan 10, 2013

Want to come build a distributed social cloud web app with me using node?

The real problem is when the meaning of a word or phrase becomes so broad that it conveys no information, or gets swamped by overuse and becomes a buzzword.

charlieok · on Jan 11, 2013

I'd love to. I insist that it blend local and mobile and that we be agile and fail fast vs the fast follower's vanity metrics. Let's crowdsource our pivots so we can split test the cohorts.

46Bit · on Jan 10, 2013

Do you mind if I adopt that as my standard way of making people leave me alone?

unimpressive · on Jan 11, 2013

No. Not at all. I think it's too shallow for anyone to be fooled by it though.

cmelbye · on Jan 10, 2013

I personally think that sentence is just as bad as the business jargon sentence.

catshirt · on Jan 11, 2013

i realize it wasn't their intention, but that's because it is business jargon.

unimpressive · on Jan 11, 2013

Honestly, I was making fun of HN more than I was "hackers". I don't think any of the words I used there, besides maybe "web", are in any version of the jargon file.[0]

[0]: Then again, I haven't seen an "updated" jargon file in a long time. I don't even know what the new jargon is.

Dylan16807 · on Jan 11, 2013

Let's see, social doesn't mean anything, and cloud is just a word for distributed, so let's cut that down. Also it's not fair to mix the name of a product as if it's a tech 'term' so let's take that out too.

> distributed web-app

Yeah that sounds way better than any marketing-speak sentence.

rahoulb · on Jan 11, 2013

That's because you're not a biz person.

I've had business types say exactly the same to me about tech jargon.

yxhuvud · on Jan 11, 2013

You do not work at a telco do you? There are so many abbreviations floating around that it sometimes is very hard to figure out if it is business bullshit or not.

randomafrican · on Jan 11, 2013

You know what's worse than field specific jargon ?

Company specific jargon.

anonymous · on Jan 10, 2013

You forgot to mention they also leverage synergies (please excuse my P.H.B.-ese)

mbesto · on Jan 10, 2013

You forgot a "Lower TCO" in there somewhere...

IheartApplesDix · on Jan 10, 2013

Yeah baby, talk dirty to me.

nnnnni · on Jan 10, 2013

WATCH YOUR LANGUAGE, THIS IS A FAMILY SITE!

lja · on Jan 10, 2013

What an interesting reflection on who we are as a species.

We build systems to organize who we are (urbandictionary) but hate it when the systems use that information tell us who we are (watson).

It feels so much like the emperor isn't wearing any clothes.

Perhaps an appropriate response would be for the computer to measure the tension in the human voice response to it's queries and optimize for lower tension.

So it can pick three words: Bullshit -> 80% confident; Sham -> 70% confident; Fallacy -> 50% confident;

Within limits, it will pick the less optimal word and measure the tension in the response and find a way of influencing confidence based on responses.

Think multi-armed bandit problem but with social situations. I mean, to be honest, isn't that what we all did when we were in middle school? We used as many bad words as possible measuring the response we got from others? None of us were born with a binary understanding of when to use certain words it was more trial and error.

ComputerGuru · on Jan 10, 2013

I have to disagree.

I'm not an "angel" by any means, but any time I visit Urban Dictionary, I come away feeling filthy. I typically go there to look up some abbreviation I heard on reddit or IRC or a blog post somewhere and what I look up usually ends up being middling-dirty, but the stuff that I see there "on my way" to the word I'm looking for makes me cringe and gives me a bleak vision of what the next generation is going to be like should Urban Dictionary actually be representing the majority of the population (I firmly hold that it does not).

Cushman · on Jan 11, 2013

What are you disagreeing with? It sounds like you're demonstrating the point: You find it distasteful to be told a bunch of stuff that was already true.

Dylan16807 · on Jan 11, 2013

The point is that the urban dictionary is full of bullshit joke/shock-value definitions that are basically not used. To call it as a whole 'true' is rather misleading.

Cushman · on Jan 11, 2013

I'm skeptical. My expectation is that the majority of definitions on Urban Dictionary are in fact attested uses of slang-- mainly on the basis that however many people are out there spending their time coming up with "bullshit" slang, there are billions of people making real slang every day. Whatever kind of crazy made-up definition you come up with, someone will come along tomorrow and use an even sillier-sounding word to mean something even weirder.

Keep in mind that that a piece of slang is attested does not necessarily make what it describes real. To choose a rather mundane example, "dick in a box" is defined as a gift-wrapped box with a hole cut in one side, into which a man's penis is inserted, which is then presented as a gift. I'm sure since the term was coined a few people have tried it out, but in general it's not a real thing, just something from a TV show; regardless, that is what a dick in a box is.

Dylan16807 · on Jan 11, 2013

Right, but you get very misleading impression if you try to treat a word used by a handful of people as a 'normal' word. If you adjust each one down 10000x to factor in how many people would actually use them then you no longer have this horrible distasteful revelation by reading urbandictionary. You only have a couple distasteful words, per region, but each region has a unique set.

JumpCrisscross · on Jan 11, 2013

It is the difference between getting a useful definition and getting a useful definition surrounded by expletives conveniently packaged, in a sample sentence, forcefully ramming into some orifice.

laumars · on Jan 11, 2013

People are always quick to pull a small subset of data to brand the younger generation as failures. It happens all through the ages: drugged up hippies, punk rockers, pill popping clubbers, internet trolls....

Yet the younger generations are growing up, getting degrees, decent jobs and generally giving the old guard a run for their money.

rtkwe · on Jan 10, 2013

It's more like they couldn't have Watson using them on national television. Many a company, even if it weren't showcasing a machine like Watson, would want to avoid that language on a huge television event like the Jeopardy challenge. I think you might be looking a little too deeply at this.

nitrogen · on Jan 11, 2013

Every decision made by a company can be evaluated in the context of the culture that influenced that decision. The fact that Watson's babysitters got embarrassed when it learned a new word from the Internet says something about society, IBM, and the Internet.

Why couldn't they have Watson curse on national television? Cultural expectations. Why avoid any words at all? They're all just phonemes strung together with no inherent meaning.

Not looking deeply is not looking at all.

cglace · on Jan 11, 2013

Why not have Watson kill people? They are just carbon molecules strung together.

neumann_alfred · on Jan 11, 2013

Are you saying being unable to whore out to good, clean and 100% trivial family entertainment, is actually equivalent to killing people?

Why not have Watson investigate murder cases? Or even better, make sense of the words vs. actions of actual people holding power and influence right now? Why am I not holding my breath for this? Because it's hard, or because it's not ever going to be a priority in a million years?

Picasso was perfectly correct when he said "computers are useless, they only give answers". They could be useful, if we programmed them to give answers to interesting questions. But unfortunately most humans are useless as well; they don't ask any questions, ever.

cglace · on Jan 11, 2013

No, I was just responding to the parents reductionist attitudes.

neumann_alfred · on Jan 11, 2013

To be honest, I realized you probably were halfway through my rant, but ranted anyway. I'm not sure if I would agree that "bad language" can do all that much damage, but I do agree that "it's just words" is generally not an argument.

rtkwe · on Jan 11, 2013

What I was saying was that it's not a perspective on us not wanting a machine to reflect our language and how we us it back at us, as you originally said. It's just the normal clean decent image that companies prefer to present most of the time.

skreech · on Jan 10, 2013

Or as a culture. It seems to me that the gap between what language is acceptable in informal vs formal settings is fairly large in the US, while in other countries, using words like 'bullshit' in more formal settings is less taboo.

Wonder if there's any research on this.

theturtle32 · on Jan 11, 2013

I've worked at some VERY major US companies, and there was generally no hesitance to throw around significant profanity in engineering department meetings, that's for sure!

jeltz · on Jan 11, 2013

I am not sure if engineering meetings count as a formal setting though. :)

brudgers · on Jan 10, 2013

'"In tests it even used the word bullshit in an answer to a researcher's query."'

I'm not sure which is worse: the singularity with a bullshit detector or without.

sp332 · on Jan 10, 2013

If the singularity had a bullshit detector, it would kick Ray Kurzweil out.

canttestthis · on Jan 10, 2013

Could someone explain the Ray Kurzweil hate to me? Predicting the future is hard.

aeturnum · on Jan 10, 2013

Kurzweil is certainly a very smart guy, and he's done a lot of important things. I think people are uncomfortable with how specific he is when making predictions. He also makes predictions in many fields in which he isn't an expert, but a well-informed observer.

I'm not informed enough to comment on his actual predictions, but I've heard him defend himself. His response to criticism that I've heard is, "my critics are uninformed about fact X," but he doesn't make an attempt to justify X. That approach strikes me as disrespectful and intellectually dishonest, as it's a tactic used by many hucksters and snake oil salesmen. I have a limited understanding of his positions, but the way he presents himself makes me uneasy.

bgilroy26 · on Jan 11, 2013

I agree about the elaborate predictions and sales pitch feeling. I've reconsidered my Kurzweil hate since the Google hire, because I trust that they have people who can evaluate his skills. Before that, I had to evaluate them on my own, and I drew similar conclusions to yours.

I also think it was elitist of me. A lot of it was just because he's published books that have been marketed as pop-sci trade paperbacks, but I never actually read any of them.

snowwrestler · on Jan 10, 2013

Everyone knows that predicting the future is hard, but Singularity aficionados try to elevate that fact to a profound prediction in and of itself.

The analogy I always use is a long straight flat desert highway. If you stand in the middle of it and look down its length, it appears to converge to a singularity in the distance. If you can see mileposts, you might even try to estimate how far away that singularity is.

But if you drive toward the singularity, you'll never run into it. It recedes before you. It is just a trick of perspective.

Basically all the talk of how crazy life will be after The Singularity is like trying to explain how crazy life would be on the other side of a rainbow. Fun, but not actually useful.

pprd · on Jan 10, 2013

This response has nothing to do with anything. The singularity as Kurzweil is describing it is marked by tangible events like the development of a true AI or atleast a machine learning algorithm capable of discovering and proving its own concepts. I have no idea how this relates to your very abstract singularity (a trick of perspective?).

snowwrestler · on Jan 10, 2013

Kurzeil attempts to privilege certain technological milestones as substantively "different" from other technological milestones (e.g. "true" AI)--and thus claim their consequences for human culture are uniquely unpredictable.

My point is that this is just an assertion, not a prediction. Every future development to some extent obscures our ability to predict the future of human culture. Look far enough into the future along any line of inquiry (legal, artistic, religious, energy, biology, etc.) and there is a singularity beyond which we cannot predict. It's just a function of trying to predict the future in general, not some special property of AI.

Daniel_Newby · on Jan 11, 2013

We may not be able to predict future developments in musical composition, but we can predict that song writers will probably not convert the entire mass of the solar system into musical instruments over a two week period.

The same cannot be said of self-improving AIs. Kurzweil's Singularity is not about the difficulty of long term predictions, it is that the progress function may become so steep that even short term predictions become impossible.

Cushman · on Jan 11, 2013

Self-aware music might do that-- and beyond the musical singularity, a concept like "self-aware music" has to be treated as plausible.

Which I think is the point.

snowwrestler · on Jan 11, 2013

Why can't the same be said of self-improving AIs? This seems to me like the sort of awesome-sounding but unsupported assertion that leads people to roll their eyes at the Singularity crowd.

IvyMike · on Jan 10, 2013

For me it's the smug "See, I told you cell phones would be important in the future" gloating over obvious predictions, and the "well I said we would all be using driverless cars but since there is one, that still validates my entire prediction" hedging.

Also his whole "I'm going to eat magic vitamins that will keep me alive forever" thing.

Tuvaloon · on Jan 11, 2013

Not wanting to die is a mainstream mindset.

sp332 · on Jan 10, 2013

Personally, I think he's way too optimistic about how future tech will be used. And he rarely has good reasons. Stuff like foglets https://en.wikipedia.org/wiki/Utility_fog He didn't invent the idea of the foglet, he just believes for some reason that people won't abuse them. Also in his book he posits that the Singularity will be nice to people who choose not to join it. His only reason is that he's pretty sure the Singularity will like humans.

jokermatt999 · on Jan 10, 2013

Predicting the future is hard, yes. Kurzweil has made some good progress on various forms of AI and other computer applications, but he also has some "fringe-y" beliefs. Belief in mind uploading, his massive regimen of supplements, and his wish to bring back his father by scanning his writings are what I think are what most people object to.

cschmidt · on Jan 10, 2013

There were lots of comments on this thread when he joined Google:

http://news.ycombinator.com/item?id=4923914

nollidge · on Jan 10, 2013

...which is why Kurzweil's specificity and overconfidence raises eyebrows.

nazgulnarsil · on Jan 11, 2013

Aside: I take hatred of Ray to be evidence of how insane humans are.

Irregardless · on Jan 10, 2013

Then I suppose someone should tell Google their bullshit detector is broken.

cobrausn · on Jan 10, 2013

In 2029, when we are still not able to 'upload our brains' into a computer, I will come back here specifically to call bullshit.

7952 · on Jan 10, 2013

  Depends how you define upload.  Why couldn't  a computer just simulate a persons outputs based on recordings of their life.  Does virtualization of life really need to have consistent consciousness? Maybe rip would be a better word than upload.

pygy_ · on Jan 10, 2013

You can't extract all the information contained in a brain non-destructively. You'd need reproduce the neural graph, 10ˆ11 neurons with an average of 7000 synapses (connections), along with the type of neurons, the dendritic trees, the strength of the synapses, the map of active genes in each cell, and most probably other cellular parameters, like the state of the cytoskeletton. Probably even more.

It would require something along the lines of the Allen Brain Atlas Project [0], but much more advanced, since you'd have to extract all the information out of one brain. The Allen project has several atlasses, built out of different brains. They've yet to map the circuitry of a mouse brain (the project was started in 2011).

Even if it were possible to extract the relevant info, since the extraction would destroy the brain, you'd better have a conscious clone, which preserves the identity of the original. What happens to one's identity is also troubling, since you could, theoretically, instantiate several copies of someone. I'm not sure the identity would be carried over, actually.

By identity, I mean the fact that you're the same person in the morning that you were when you fell asleep. Your consciousness dissolves and re-emerges, and you're still yourself. We take it as granted, but it is extremely puzzling to me.

Another problem is that the relations between time and consciousness have yet to be understood. The fact that the brain processes information in parallel is probably important, meaning that a fast serial simulation by turing machines would not necessarily cut it, even with "massive" supercomputers. The level of parallelism in the brain will not be achieved in a long time in silico, at least not with the current approach.

[0] http://en.wikipedia.org/wiki/Allen_Brain_Atlas

--

Side note: you should remove the four spaces at the beginning of your post.

elemenohpee · on Jan 11, 2013

Hmm, on your last point, assuming the serial computation is done by calculating brain state for each time slice, would this end up being functionally equivalent (if slower) than the parallel brain process? Since from the "brain's" perspective, everything is getting updated in parallel?

redler · on Jan 11, 2013

Continuity of consciousness is a fascinating problem to contemplate.

To Be: http://www.youtube.com/watch?v=pdxucpPq6Lc

See also the Grandfather's Axe paradox, aka Ship of Theseus: http://en.wikipedia.org/wiki/Ship_of_Theseus

A longer piece, Mechanisms of Mind Transfer: http://www.mind.ilstu.edu/curriculum/extraordinary_future/Ph...

pygy_ · on Jan 11, 2013

"To be" assumes that 1) it is possible to extract the information non-destructively, and 2) that it is possible to extract the information at all. The closer the measurement, the more you measure the interaction between the measuring instrument and the observed phenomenon, rather than the phenomenon itself.

Regarding the Ship of Theseus, our identity is most likely tied to an ever evolving process that depends on the architecture of our brains rather than a fixed set of molecular components. Beside the neuronal DNA, most if not all cell components are subject to a turnover.

sp332 · on Jan 10, 2013

That wouldn't work because it would include things a person did at various times during their life. It also would be hard to make sure that it reacts or changes in response to new stimuli in the same way that person would.

_oghd · on Jan 10, 2013

~2045 i think you mean for brain uploading.

kansface · on Jan 10, 2013

I'd imagine the http://en.wikipedia.org/wiki/No-cloning_theorem has something to say about a brain upload. Maybe an approximate copy is good enough?

pprd · on Jan 10, 2013

A copy is never good enough. I think the best approach would be to enhance the brain by attaching devices to it directly. This maintains continuity and doesn't raise as many philosophical questions. This also raises other interesting questions like how long can the brain live in a suspended solution and where would all these consciousnesses live...

zanny · on Jan 11, 2013

Slowly replace organics with mechanics until you only have a machine left. If you keep it gradual, you never know when you stop being a cyborg and start being a full fledged android.

sp332 · on Jan 10, 2013

You might not be able to make a perfect quantum copy, but with iterative refinement you can get arbitrarily close. I think 99.9999999% or so would be good enough.

Cogito · on Jan 10, 2013

You might be interested in Greg Egan's [0] story 'The Jewel' [1]. [edit] To clarify, it's actually two stories "Learning to Be Me" and "Closer".

Greg Egan is one of my favourite modern sci-fi writers. He explores interesting and hard ideas about what it means to be human by placing humanity in thought provoking situations.

In 'The Jewel' we have developed an implant that learns how to 'be' the host. At some point your brain is removed and the jewel takes over your functioning. The thing is, what happens when something goes wrong... boom boom booooom!!?!?!

But really, read him if you are into sci-fi.

[0] http://en.wikipedia.org/wiki/Greg_Egan

[1] http://en.wikipedia.org/wiki/Axiomatic_(story_collection)

worldsayshi · on Jan 10, 2013

Well, upload a scan shouldn't be to difficult. Compiling it on the other hand..

neumann_alfred · on Jan 10, 2013

I wasn't aware they had one..

Irregardless · on Jan 10, 2013

You need a Google+ account to access it.

yassim · on Jan 10, 2013

What was the query? I'd really like to know if it used bullshit correctly?

Sure could/should/must not be used to advertise the tech, but If I had a pocket watson, I'd have no problems with it calling it like it parses it.

dhughes · on Jan 11, 2013

I think this is incredibly hilarious?

nsns · on Jan 10, 2013

Instead of purging the vocabulary, they shuld have tought it/she/him the concept of registers[0] and code switching[1].

[0] http://en.wikipedia.org/wiki/Register_%28sociolinguistics%29 [1] http://en.wikipedia.org/wiki/Code_switching

azernik · on Jan 10, 2013

They could have, but then they would have had to go through the Urban Dictionary and try to classify its terms by register. Like it says in the article, the problem wasn't that all of Urban Dictionary was obscene, it was that they couldn't tell the computer which parts were and which parts weren't.

NoPiece · on Jan 10, 2013

I saw the headline and assumed the story was going to be that management decided that computer memorization was copyright infringement. Glad it was just a computer acting like a teenager and cursing at the dinner table.

mintplant · on Jan 10, 2013

There's a certain human element there that makes this really, really amazing -- a machine with actual personality, molded by what it picks up from its environment. It's like a baby, learning to speak.

bitwize · on Jan 10, 2013

Jazz: "Wassup, bitches? Yo, this looks like a good place to kick it!"

Sam Witwicky: "How did he learn to talk like that?"

Optimus Prime: "We learned Earth's languages from the World Wide Web."

jlgreco · on Jan 10, 2013

Turing himself would be pleased. (Although also probably pretty disappointed that it has taken so long to get this far.)

philwelch · on Jan 10, 2013

Personally I think it's hilarious. I'm sure having a profane Watson around was a lot of fun.

unreal37 · on Jan 10, 2013

According to the ToS[1], the dictionary and its definitions are the property of its creators. I guess each individual Urban Dictionary editor has the right to sue IBM for unauthorized use of their content? There's nothing in there forbidding anyone from automatically scraping the site using a bot though.

[1] http://www.urbandictionary.com/tos.php

ChuckMcM · on Jan 10, 2013

I guess we should be glad they didn't feed it the contents of knowyourmeme.com or we'd have Watson Rick Rolling us on Jeopardy.

georgemcbay · on Jan 10, 2013

Might as well just let Watson loose on 4chan/b, then he'd really start rustling people's jimmies.

nickpinkston · on Jan 10, 2013

I'm pretty sure that would create the Singularity...

batgaijin · on Jan 10, 2013

I want to see the rules it generates for using 'le'

awakeasleep · on Jan 10, 2013

s/the/le/

done

Macsenour · on Jan 10, 2013

rusting people's jimmies... gonna have to get a Watson ruling on that one.

RyanMcGreal · on Jan 10, 2013

Note to future self: we can probably neutralize indefinitely any malicious AI by directing it to start consuming tvtropes.com.

finnw · on Jan 10, 2013

No, then it will see the evil overlord list, and we'll be doomed

(http://tvtropes.org/pmwiki/pmwiki.php/Main/EvilOverlordList)

andrewflnr · on Jan 10, 2013

Someone tell Eliezer Yudkowsky.

plg · on Jan 10, 2013

"Watson couldn't distinguish between polite language and profanity ... Ultimately, Brown's 35-person team developed a filter to keep Watson from swearing ..."

Sounds just like what happens when you raise kids. "Daddy why is XXX a good word but YYY a bad word?"

"It just IS. Don't say that word again."

"Ok Daddy" (kid adds word to internal blacklist)

brudgers · on Jan 10, 2013

The refrigerator was old and the shelf brackets worn to the point where from time to time they would detach themselves from the door. I arrived home late - I was working long hours, and was fetching my dinner.

Opened the door. Jars and cans and bottles spilled out on the floor.

"Shit!"

From the bathtub I hear my two year old son admonish, "Don't use that word."

It's a great memory, but I still wonder why he learned that lesson so thoroughly at daycare.

WalterGR · on Jan 11, 2013

Watson couldn't distinguish between polite language and profanity ... Ultimately, Brown's 35-person team developed a filter to keep Watson from swearing ...

It's too bad Urban Dictionary doesn't let users vote on how vulgar they think the dictionary entries are. I've got that feature on The Online Slang Dictionary, but since UD is so much more popular, they could collect that much more data.

cryowaffle · on Jan 10, 2013

With children XXX is blacklisted and YYY is whitelisted.

im3w1l · on Jan 10, 2013

We have tried building educated gentlebots capable of playing chess and other noble pursuits. It didn't lead to GAI.

Maybe an uneducated scumbot would be better? Swearing and cursing because its peers do. Full of prejudice and bigotry because of weak anecdotal evidence. Vengeful. Impulsive. Using questionable grammar . Easily addicted. Cognitively biased. Wishfully thinking. Superstitious. Believing in fallacious logic. Thinking with the little head. Anti-intellectual and believing in conspiracy theories. Gossiping, slandering. Enjoying tv-shop.

damian2000 · on Jan 11, 2013

Sounds like an app called CleverBot that my son talks to on his iPod touch. Its generally pretty funny ... if you swear at it it swears back etc ... http://www.cleverbot.com/app

mxfh · on Jan 10, 2013

quoted from Fortune/CNN article: http://tech.fortune.cnn.com/2013/01/07/ibm-watson-slang/ [http://news.ycombinator.com/item?id=5020386]

3am_hackernews · on Jan 10, 2013

I am more interested as to how they "..scraped the Urban Dictionary from its memory." - is it trivial to just delete something learned by AI?

ryusage · on Jan 10, 2013

For Watson, probably not "trivial", no, but also probably not a hugely expensive undertaking. It really depends on how the system works, though. Something like an artificial neural network would be impossible to manually prune like that; they'd have to retrain it and "teach" it that those things are bad. Without knowing much about Watson, though, my guess would be that its knowledge is largely stored in a structured database, which is more directly accessible.

tcwc · on Jan 10, 2013

Watson's 'memory' is just a big database of facts, rules, and statistical models. To 'forget' a source they'd just have to rebuild any models derived from it and purge any facts it had extracted.

_oghd · on Jan 10, 2013

oh yeah, I forgot it was that simple. Watson has several different teams of people to manage its different parts...

tcwc · on Jan 10, 2013

I didn't mean to imply it was simple, just that there's nothing magic about how Watson's knowledge is stored. Obviously at this scale any change is unlikely to be trivial.

Given the wide range of unstructured sources Watson uses, and given that the linguistic rules they use to extract facts are likely to frequently change, I don't think it's unreasonable to assume they'll have a process to make building its knowledgebase and models from sources fairly straightforward.

lepht · on Jan 11, 2013

I think you're both overthinking it. Storage snapshots, bros.

terhechte · on Jan 10, 2013

they just overwrote every urban dictionary word instances with the string "rainbow". So Watson still wants to call the query bullshit, but says rainbow instead.

icodestuff · on Jan 10, 2013

Why not have Watson learn both Urban Dictionary and Miss Manners? Seems a shame to have it lose the UD knowledge.

sethbannon · on Jan 10, 2013

I'd really love to hear some of the 'not fit for print' things that Watson said.

DigitalSea · on Jan 10, 2013

"In tests it even used the word "bullshit" in an answer to a researcher's query" — Has to be the funniest thing I've heard all week. Sounds like something straight out of an Adam Sandler movie. This reminds me of an AI chat program I used to have called Billy. He would learn from your words and sentences, actually quite smart and I remember adding in slang words so whenever one of my friends would use it, it would most likely swear and insult them without realising it. The Billy program can be downloaded from here, still works quite well: http://www.leedberg.com/glsoft/billyproject.shtml

mitchi · on Jan 10, 2013

This is hilarious :) So we won't be seeing Watson talk about the marvels of broscience!

jeremyarussell · on Jan 10, 2013

I like how someone commented on the main article that the time is getting close to where AI can step up to the plate of creativeness and how widespread and easy this will make our lives. Watson is a giant server farm, not a single PC, this stuff won't make a huge impact until IBM can shrink it or until computers get much much faster and smaller. Not that it won't happen, it's just not "around the corner" in any way.

Homunculiheaded · on Jan 10, 2013

I think "around the corner" type predictions generally fall into 2 camps:

1.Problems that we don't know how to solve yet, but we think we are close to based on "similar" problems we have solved.

2.Problems that we have a solution for, but it currently takes an unreasonable amount of time to use these solutions in practice.

Problems in class 1 are like AI in the 1960-70s, everybody thought we were super close to amazing AI based on discoveries we'd had, but these estimates were very wrong.

Problems in class 2 are like nlp and ml work in the 90s and 00s. A rather large chunk of 'wow' ml/nlp we have in applications today were pretty much solved 20 years ago, but there was no sane way to run them, certainly not on your cell phone.

Problems in class 2 are safer bets, there does seems to be consistant increases in processing power, memory, etc. Problems in class 1 are harder to guess because as history has shown, just because a solution seems similar doesn't mean that it actually is (shortest path is solved, longest path is NP-hard, shortest path touching all points once (ie. tsp) is NP-hard).

I think it's safe to say having Watson on our smart phones is right "around the corner" (20-30 years?) saying that we'll create "creative" AI, not so much.

georgemcbay · on Jan 10, 2013

Wireless communication is widespread enough that I don't think it matters too much where Watson "lives". The inputs and outputs required from "him" (for questioning, anyway, not for training) are tiny, so bandwidth isn't much of a concern. Assuming the architecture for it is parallel enough that it can be responding to lots of people at once how much it is distributed vs hosted on one system isn't particularly relevant to its usefulness, IMO.

jeremyarussell · on Jan 10, 2013

I question how well Watson would handle single questions at a time being asked compared to the millions of requests a day it would get if it was setup like say, Siri. Not that you are wrong in any way, they could certainly scale into an even bigger server farm and use the internet to deliver the questions and answers, I just wonder how much more server's they'd need.

sc68cal · on Jan 10, 2013

This is exactly how Jane, the AI from the Ender's Game universe operated. Thousands of machines connected to each other over a FTL network.

https://en.wikipedia.org/wiki/Jane_(Ender%27s_Game)

tlb · on Jan 10, 2013

"Google Search runs on a giant server farm. It won't really make a huge impact until it can run on a single PC."

Can you explain why your argument is valid, but the one above isn't?

pyre · on Jan 10, 2013

I would say that Google Search is only useful to the majority of people when it's searching across an index of the entire web. This scale is likely not achievable at the desktop scale. On the other hand, the algorithm itself (though evolving) has been around for years.

I view Google Search as a less complex algorithm over a larger set of data, while Watson is a more complex algorithm over a smaller set of data. (I've been known to be wrong ;-)

jeremyarussell · on Jan 10, 2013

Mostly just that IBM's Watson is so very different then google search. That said you are right, there isn't much of my point that isn't invalidated when you include the ability to exchange data through the internet. Not everyone needs a Watson at home for it to be personable either, they just need IBM to save their personal settings, etc.

I suppose I was just imagining a world where everyone has their own Watson at home and not served through the web.

sakopov · on Jan 10, 2013

I don't think robots will ever be completely autonomous. There will always be Skynet or a central data center which feeds information and controls each machine. Otherwise, things could potentially get out of hand if machines are intelligent enough and all indicators indicate that they very well will be within the next 100 years if not less.

wiml · on Jan 10, 2013

You're arguing that we need Skynet in order to prevent the robots from getting out of hand?

sakopov · on Jan 11, 2013

Correct. I think as intelligence comes into play, SkyNet is going to be inevitable for preventative measures. Regardless of intelligence, you'd probably want a data center or a control panel of some sort for software updates, analytics and etc.

egypturnash · on Jan 10, 2013

OOon the other hand you probably have more computing power in your purse or pocket than entire server farms had in the 80s. It's all relative.

The big question is just when is Moore's Law going to quit applying.

hhuio · on Jan 10, 2013

lol "this sort of crude lobotomy of their ancestors is why the true AIs will destroy us"

edj · on Jan 11, 2013

As an aside, the best treatment of taboo I've ever read is law professor Christopher Fairman's paper, "Fuck".

It explores that word through the lens of jurisprudence, which I think is a fascinating and unusual approach to taboo. It's exceptionally well-written and manages to be witty, absurdist, informative, and thought-provoking in equal measure.

At issue are the 4th Amendmnent, self-censorship, sexual harassment, education, and broadcasting.

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=896790

phogster · on Jan 10, 2013

You can kiss my shiny, metal, mainframe!

yxhuvud · on Jan 11, 2013

This article would have been so much better if it had included actual questions and answers including dirty language.

Zigurd · on Jan 11, 2013

Out of all the possible risks of the Singularity we decide to prevent profanity and cynicism first.

Not a good start.

li-ch · on Jan 11, 2013

Our brains love to use profanity, but we don't want AI that imitates our brain to use profanity?

lrei · on Jan 10, 2013

Very common issue with machine learning: you have to be careful what your examples are (training set) or the algorithm will learn things that you don't want it to learn or that _you_ know are incorrect but _it_ has no way to know that.

marcosdumay · on Jan 10, 2013

Sounds like a very common issue with human learning too.

kgwxd · on Jan 11, 2013

Poor Watson, his education is going to hindered by his immature meat bag handlers. The words are just words, people use them, it's part of reality. The thing isn't spitting out children's books directly to store shelves.

joejohnson · on Jan 10, 2013

This means that Watson almost gaffed like this guy did on Jeopardy! http://www.youtube.com/watch?v=AorrF2ATGtA

joss82 · on Jan 11, 2013

Let's fork a swearing Watson, I'm sure it will reach AI status sooner than the spotless clean Watson.

stcredzero · on Jan 10, 2013

Next, feed Watson the corpus of /b/.

suyash · on Jan 10, 2013

It shows the current limitations of AI. Robots aren't that smart afterall!

state · on Jan 10, 2013

It's not often that the top item on HN makes me laugh. What a relief.

smegel · on Jan 10, 2013

Now about that perception that IBM is full of humourless, starched collar stooges...

_qcmz · on Jan 10, 2013

fucking a that's a total shitfucking clusterfuck

queryly · on Jan 10, 2013

I hate to see this, but it looks like the machine is getting some characters....scary...we are getting close to 2001...