Hacker News new | past | comments | ask | show | jobs | submit login
Cuil responds to critisism of Cpedia (cuil.com)
92 points by boyter on April 14, 2010 | hide | past | favorite | 72 comments



My first inclination was towards disliking the writer - he comes across as arrogant and dismissive of massive criticism, which seems foolish given his position. If he said something a little more humble, he'd be infinitely more sympathetic. Perhaps something like, "We're trying to do something meaningful, and still working to get it right" would go over well.

So my first inclination was dislike. But my second was more sympathy - this isn't a guy who is being an arrogant jerk, this is someone who is convinced that the hallmark work of his life thus far is quite good, criticisms be damned. I'm that stubborn sometimes. This is his biggest, and realistically, perhaps only shot at lasting greatness. He thought he was building the next great American technology company, and reality didn't measure up and he hasn't realized it.

So he points outwards - "they don't understand!" - but I don't dislike the guy. Who knows how long until he realizes what he's really got on his hands, and how he takes it. It'll surely be an unpleasant moment, punctuating a couple years of excitement followed by massive disappointment. I feel bad for him. I hope I never wind up there.


As a engineer, I sympathize with him completely. I would react the same if I had worked on a really really tough problem and made an incremental advance, which I thought was a breakthrough, but everybody dished it because they didn't understand it. Not to say what Cuil is doing a great job. But Machine learning and NLP is a tough thing.


I would react the same if I had worked on a really really tough problem and made an incremental advance, which I thought was a breakthrough, but everybody dished it because they didn't understand it.

I was once part of a company that thought it had superior technology that everyone else "didn't understand." It was true, it had marvelous technology. It was also true that lots of people didn't understand it.

Here's what it also had: 1) a deployment system everyone agrees is a pain in the @@@, which they never paid much attention to, 2) user interfaces which didn't repaint themselves like native widgets and which looked terrible and had terrible APIs, 3) intermittent bugs the customers kept reporting and getting disregarded and blamed for, 4) a less than open, "not invented here" culture with a big dose of "us versus them."

It's cool to make an incremental advance. The hard part is getting people to understand it. The hard part is reaching your audience. For this, it's more productive to enlist their help, not to blame them.


But users don't care about technology, they care about getting something useful - and the few times I used it the results were almost laughably bad.


I think it may be the format the result is presented in. It acts as if it can give the world then gives very little. If they were upfront about the level it is at by making the design more about listing pieces of information rather than trying to assume they all mesh together nicely they may get better reviews.


How it's presented is a huge part of why people reacted so negatively. They present the articles as prose, so people expect it to flow like prose. This makes them very sensitive to disorganization.

I think they're deserving of the criticism they've received. Cpedia is an interesting idea, and I assume they've done a lot of really good work we're not grasping. But it's not ready for public consumption yet, and they shouldn't have pretended it was.

They were over-zealous. It happens. What's important is how they move ahead. I'm with wheels on this one. We shouldn't write them off yet.


No doubt it's a huge problem area, been researching around areas that would be involved in this like clustering and summarization,

Google Squared attempts to be less ambitious and still feels very incomplete and they probably have some very smart people and the backing of the most complete index of the internet and probably the most computational power afforded to any work in the area.


It might work better if they just presented a list of results, and users could up/down vote the results for relevancy. Overtime they could use that feedback to float more relevant "snippets" to the top. Kinda like wikipedia but without all the hassle of typing.


What about allowing users to re-order the snippets by draw and drop - or at least move up, move down, and reject buttons?

Maybe even allow suggested snippets too - sort of like Wikipedia built from nothing but citations.


How about using mouse tracking like the page analytics guys do. What people are hovering their mouse over is what their reading ... and that constitutes a "vote".


I appreciate that it's a hard project and I appreciate the work that's gone into it. NLP is hard, and any progress in it is always amazing and a worthwhile endevor. However, the measure of a product is not how hard you worked on it, but how well it works.

Cuil/cpedia doesn't work. They shouldn't be releasing it as if it were a product. Progress towards the solution is great, but unless it reaches or almost reaches the solution, it should be published in an academic journal.

By releasing it in the wild, you set an expectation for how much progress has been made. Blaming the user for not understanding the problem is incorrect; the user is not supposed to see the problem in the first place. If your users need to understand the problems that you've had under the hood, then your product isn't ready for prime time.


Tom ends with a poem implying his users should shut up, which is worth citing in it's entirety:

"A wise old owl sat in an oak \ The more he heard, the less he spoke \ The less he spoke, the more he heard \ Why aren’t we all like that wise old bird."

I imagine the working in the consumer web space - "bottom-feeding" - as he put it, is not his ideal career.


I think it's interesting that he took it upon himself to write this unnecessary and unnecessarily long blog post then demonstrated his complete lack of self-awareness by putting this poem at the end.


It's plausible he is the owl here.


I wouldn't write Cuil off just yet. What the do have is a functional, if not stellar, web scale search engine and a web scale crawler. They've also probably got enough money to keep the lights on for another 2-3 years (perhaps with some cuts) while they figure out where to steer that.

That's why I was somewhat surprised with this play. I don't connect the dots. If they'd spun into say, domain specific searches (a la Kayak, Indeed) or some sort of data mining for business (a la Rapleaf) that'd have clicked more with me.


Don't do this Scott. Don't :(


Sorry, they deserve all the criticism they are getting. The problem is not that they have few mistakes here and there, it is that every single article I have tried has been composed of a bunch of mostly incomprehensible unrelated sentences randomly ordered and placed next to each other with no connection.

If you evoke an encyclopaedia in the mind of your user, the user will expect what they are getting is an at least somewhat encyclopaedia-like thing. I.e., that it is understandable, that sentences follow each other to make coherent paragraphs, etc. In this respect, Cpedia completely falls on their faces.


What if they had a disclosure (maybe until things get "better") at the top in bright bold red that said:

"Cpedia will quite often be wrong or odd. The web is quirky and we're trying to make sense of it. Don't take anything below seriously just yet. Let us know what parts of this article you like, though."

Would that make people criticize them less or give them more slack?


That would still be incorrect. For all the searches I tried (including a number of people for whom "other people write more about them than they write themselves" is true) the results are laughably bad.

Maybe if they changed it to: "Cpedia will almost always be wrong and nonsensical."

But then, why release it?


Sometimes, big companies like the one I work for (Microsoft) or Google or Yahoo get beaten up for having 'obvious' bugs or failures. People say that given the scale of our resources, we should do better and perhaps they're right.

In this case, here's a small startup trying out something new and potentially cool. Sure, it doesn't work well in some (maybe many) cases but atleast they're putting themselves out there. That takes way more guts than what a lot of people have. This probably represents several late nights and weekends feverishly pounding out code and trying to get things to work.

As an engineer, I'm with Cuil.

To quote Roosevelt " It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds; who knows great enthusiasms, the great devotions; who spends himself in a worthy cause; who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat."


One unexpected effect of Cpedia aggregating articles about things to build pages is that the page for the software I write and sell contains both it's description and links to where to get a cracked version :(

http://cpedia.com/wiki?q=GhostTrails&disambig=Andreas%20...


You seem to have done the impossible, and uncovered a Cpedia article that might actually contain exactly the information a search-engine user was looking for...


If your site is called -pedia, guess what - users expect a wikipedia like encyclopedia that is useful.

It doesn't matter how incredibly sophisticated your algorithms are or how smart you are.

It doesn't matter how hard the problems are or even if they are solvable.

It doesn't matter that you know much more about linguistics than anyone else.

If the end result shown to the user is utter rubbish for _their_ purposes, then you can expect the users to complain and not come back.

If this was supposed to be a tech demo called c-dredger or info-frankenstein then it certainly is an accomplishment. But as a useful tool, it needed more work before being shown.


I dunno about that. When I go to Uncyclopedia (http://uncyclopedia.wikia.com/wiki/Main_Page), I have no expectation that it will be useful, only that it will be like Wikipedia.


I've mentioned it before, but I'll repeat it - the length of this article:

http://cpedia.com/wiki?q=United+States+of+America

should be embarrassing. I mean, really?

(On the other hand, it's the only country which I've so far been able to get CPedia to spit out any sort of article for. Japan? China? Germany? Russia? Brazil? France? It can't seem to see the forest for the trees -- it's got articles on historical leaders an regions, but nothing about the countries themselves.)


Cpedia seems to have trouble with terms that frequently appear in other phrases. For example, they don't have an article on Nokia, but do have Nokia 6300, Nokia N70 and so on.


Try the countries full name.

For instance there is some garbage about China under: http://cpedia.com/search?q=Peoples%20Republic%20of%20China

Of course it seems to be something about contract law for China... but that's another matter.

Actually try "of china": It's got some relevance, so does "peoples republic".


Because it seems to want you to be more specific.

http://cpedia.com/wiki?q=United%20States%20of%20America%20%2...

That shows all of the pages and seems to be more what you were looking for I would say :)


That doesn't seem much better, really. The second article ("United States of America (Mark Murray)") is the only one which appears to have any significant content, and it's pretty disjointed.


"Mostly harmless"


I've thought of a use for Cpedia: generating Mahalo pages.


At least sometimes the results would be marginally useful.


Worth a read, but if they just conceded that cpedia was an exercise in dadaism, their defense would be a lot more convincing ;)


I looked up cpedia for the first time and though dadaism did occur to me, it's not really fair. After all, it's not actually random.

Every paragraph in the resulting page was somehow related to my keyword. The only problem is that the paragraphs weren't very well connected. ;-) Of course, solving that would be a major AI breakthrough.


Your example is probably more likely to be an outlier.

I've done several searches, some of the most disappointing were singular searches of pretty much any animal you could think of - returned nothing.

The search for "cow" is particularly bad, returning no entries for what you would expect (a cow), but 3/10 entries returned duplicate entries for cow dung.


Guys please explain to me why so many of you are coming to the defense of the engineering effort? Perhaps it's different in this case, but most companies, particularly ones funded with 33 million dollars, make decisions based on business and what will eventually lead towards making actual moola. Theirs is not a charity case. Engineerig efforts there are not acts of chivalry. What is their big master plan on coming out the end of this tunnel, or are they simply taking random shots in the dark hoping to hit somethig unexpected?


I will though acknolwedge sympathy to the engineers for having such a seemingly impossible task laid upon them.


Note to self: after securing a couple of millions in funding do hire a PR person; then stay quiet.


The more I look at cpedia, the more I'm convinced it's just a Markov bot.


If they had learnt from the Cuil episode, they would have ensured at least the result for "Cuil" or "Obama" made sense. The negative reaction they are getting stems from the amout of noise they made when they launched their search product.

I am almost certain even Steve Blank and Eric Reis would have advised them not to "go to market" so soon

Cpedia and Cuil might be their minimum product, but it is not anywhere near viable. They are a great example why CLOSED Alpha and Beta exist.

Edit: Their feedback section says "Feedback: Cpedia is the first of its kind on the web and we'd love to hear what you think. We promise we'll listen!"

That poem is more applicable to him. Maybe he did not realise his smart 6 year old was talking to him.


Looks like their ears are closed to criticism. ;-)

http://cpedia.com/search?q=Criticism+of+cpedia


Cpedia: Telling You Whether Your VC Over-Imbibes Since 2010

seriously though, he's trying to construct an argument around the necessity of bottom-feeding for information?


Maybe there is something to this...

If I was trying to get funding for cpedia, I'd want to know which VCs get drunk easily...


It seems he is.

Let's grant that 'digging deep' means a certain number of gems -- like the fact your next meeting is with someone who drinks too much.

But as Tom Costello admits, this approach also "ensure[s] that we will have mistakes". If those mistakes are similar to the gems -- like erroneously associating a report of drinking too much with the wrong person -- I'm not sure that makes the case that the "bottom-feeding" was worth it.


I think it would be fair to forgive any amount of alcoholism among the ranks of Cuil's investors.


Well, you can tell he cares about his work. It may be somewhat funny now, but someday we'll say this was ahead of its time. I'm sure it wasn't easy to make this. I think there's something to take away, and maybe incremental improvements will eventually make it more useful. At least it's original.


> It may be somewhat funny now, but someday we'll say this was ahead of its time.

Things are usually only 'ahead of their time' if you ignore a lot of the implementation and operational details. For example, Vannevar Bush's famous Memex, always cited as the important precursor to hypertext, was flatly unworkable: A desk-sized machine built around microfilm? It makes as much sense to talk about Jules Verne as the father of manned spaceflight because he thought of firing men from cannons.

My point is that being the first to an idea is rarely the important thing. Being first to a good, solid implementation is what counts.


"Cpedia is not an attempt to build something that knows all current knowledge and can write a meaningful essay on any topic – that would be a stretch goal."


They are succeeding. They have no meaningful content on any topic at all.


Cpedia is the Lorem Ipsum of search engines! Hmmm, a lot of uses suddenly spring to mind...


It reads to me like he is defending his team. I would bet that the people behind the scenes are feeling pretty down and for the most part unfairly: there was so much hype built up about this company, if they didn't completely nail it the first time they were doomed to fail in the eyes of the community.

I agree with the others here that he would get more sympathy if he was less snarky, but I imagine he's dealing with a lot of emotion from his team. I couldn't imagine trying to build a company with so much pressure, such an intelligent group of people and all of the negative press at the same time; what a nightmare.


This blog clarified for me what Cpedia is about. As presented in the blog, it does come across as a powerful tool for mining obscure information from the web. Maybe panning gold is an apt metaphor. What you wind up with is going to be full of mud and silt, but if you slosh it around enough, you may get some gold dust to show itself. Most users are expecting fast access to less obscure data and don't have a deep appreciation for that sort of process, however.

In other words, it's a useful tool for an unsightly and messy process. In other words, it is the last thing you'd want to expose to the web at large.


Article seems well written, and I can forgive the defensiveness given how much bile was spat in their direction.

When I saw the cpedia article on here I also thought it was stupid. With the explanation it makes a lot more sense what it's trying to accomplish. A little blurb on each cpedia page with a summary of that explanation would probably do wonders.

He's right though, often when I google something, there might be 20000 results, but only 3 with unique info. Solving that problem is a reasonable goal, and it's a bit ridiculous to all point and say 'look, they didn't solve it perfectly, hahaha'.


You are right that googles' results can be repetitive. But I am not expecting cpedia to be perfect. I am just expecting their solution to be better than the problem. And that's not the case right now.


Yeah, the thing that keeps me from giving it much kudos is that I suspect it really is just sentences from the search results strung together in random order, with some noun phrases turned into section headers. And that's not really particularly innovative, though it's audacious to do it and call the result an encyclopedia article. If there is any attempt at producing coherent structure out of the sentences as they're strung together, it hasn't shone through...


One has to wonder whether Mahalo primed the pump for reflexive hate of this sort of project. I'm not suggesting that there's actually anything ethically questionable about Cpedia btw, but just that people might subconsciously make the association and work themselves into a rabid frenzy.


Wow what is all the fuzz about the product is prefect. check out john carmack on cpedia http://cpedia.com/search?q=John+Carmack

well i think the result is excellent


I found the section in the search on Jeff Bezos particularly moving.


"Blas, for those of you not from the West of Ireland, is the polish a hurley gets from the sliothar when used by a player of unusual skill, a patina on the surface of the wood testifying to the depth of talent of the player that had used the stick."

I'm from Ireland, have played hurling for about 21 years, this is the first time I have ever heard of this! Cuil finally taught me something, it only took them 2 years! Granted I never use their service...and there are many things I don't know.


> Blas, for those of you not from the West of Ireland, is the polish a hurley gets from the sliothar when used by a player of unusual skill, a patina on the surface of the wood testifying to the depth of talent of the player that had used the stick.

Now, how about explaining this for those poor benighted souls who speak English?


I am Irish too :)

Hurling is a popular native ball sport in Ireland which involves two teams playing against each other with sticks called hurleys and a ball called a sliothar. Think of it as ice hockey without ice / skates and the sliothar instead of a puck. A wee bit of violence and tribalism are also as big part of the game, as in ice hockey...


Seamus? Who worked in Arvato?


Blas is the Irish word for 'taste'. A hurley stick develops a 'taste for the ball' in the hands of good player. Supposedly the tough leather ball (sliothar) hitting the soft ash wood creates a sort of sheen over time when done right. The stick becomes tempered. Tom's allegory is that Cpedia has not yet had the time to develop 'a taste for the ball'.



They'd have done much better if they'd called the project CrandombitsOfTextFromOurAlreadyBadSearchEngine.info Or OurCEOSuffersFromAbstractionDisorder.biz


"We've raised too much money - no turning back!"


I found this article quite enlightening, but then, I guess that is the nature of the subject.

http://cpedia.com/wiki?q=rosicrucianism


I don't realize why the people are pushing down an effort to give us more power of choice in our searches. There is a lot of room to improve? Sure. Let's support the improvements.


The cpedia page on cpedia doesn't yet have a list of criticisms. There's no wikipedia page on cpedia yet, but there is a list of criticisms of Cuil.


Haters gonna hate


Actually, this comment sums up what I thought was wrong with the article. Haters are gonna hate. If you ask for feedback, you have to accept that you will get a lot of negative feedback, especially when you're trying something new, unheard of, and label it with a well know suffix (pedia). My main take away from the article was that they marketed it wrong. If you have to explain away the criticisms in a blog post, (and a few of the arguments were valid) that means you promoted it badly. None of the criticisms were wrong per se, just that he thought they didn't get it, or get how hard what they're trying to do is. That is their fault.

Plus I do agree with one of the comments above that the poem should apply to him. One thing I have learned from being in business a long time, its that negative feedback is gold, if you use it to turn a situation around to leave your customer with the feeling that they have been heard, been paid attention to and taken care of. That kind of customer will stick around more than a customer who used your product and it just worked fine. I love to deal with problem customers, as it is an opportunity, not a chore, if handled correctly.

Excuses are a dime a dozen, especially when it comes to cutting edge tech.


What's all this sudden love for the cluster-fsck that is Cuil? It and anything it touches is just horrific, Cpedia included.

Sorry to say it, but it's true. It will be a huge waste of money and engineer manhours that would likely be better spent if they were working on web-based clones of minesweeper, with server clusters dedicated to 2-play minesweeper MADNESS.

This, THIS is how idiotic Cuil is.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: