Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We need a search engine that allows for deep search. It should be an open and cooperative project, and users could possibly run an instance of the spider/indexer as payment for executing searches, so it could be like a cross between bittorrent and Tor.

A free search engine would enable API calls and also boost privacy and freedom from the likes of Google. We have built a lot of experience about search engines since 2000, we have access to scientific papers, cheap cloud servers and a huge interest in freeing search, so I think the open source community do it.



Writing an open, and distributed web crawler / indexer is a nice programming exercise.

Writing an "objective" ranking function (for any values of "objective") in an open, and distributed manner is structurally not favoured by humanity's current incentive structure. As in:

* a dev team have to agree on signals, and weights: "SERP quality" has dedicated teams of people assigned for specific verticals @ Google; replicating this in a distributed manner will be played politically

* Assuming any significant usage, the second you submit ranking code to public github repo, the algo will be played by thousand SEO scammers to their advantage

* Executing custom ranking function on other people's computer not only introduces security risks, but will have scammers setting up honeypots for collecting other people's ranking signals, and playing accordingly.


could embrace it in a weird way

only open source the framework for the server and client

then companies / communities / etc, can make their own algos, and buy their own servers. The reward for proving your servers / crawlers is more people use your algo (higher chance of hitting your nodes)

and then allow the client to have configurable automatic node filtering, along with manual node filtering, so if a person feels that a specific node set is just full of bs, they can filter them out (and also prefer certain node sets in turn, to which they can donate if they are consistently happy with the results)

its like, choose your own filter bubble.


> submit ranking code to public github repo, the algo will be played by thousand SEO scammers to their advantage

Just a thought : Ranking code could itself learn & adapt to each individual user (the learned "weights" could be sync'd online across your devices). Weighted signals from users can be fed back to the mother ranking algorithm (un-customized one). Basically millions of distributed deep minds[1], instead of a single one.

I can imagine there are a lot of holes in my theory, but we can't simply accept that open sourcing the algorithm implies that it can't be done.

[1] : https://deepmind.com/


> Ranking code could itself learn & adapt to each individual user

Cue blackhat SEOs creating millions of subverted "users" on AWS spot instances/Lambda


> Writing an "objective" ranking function (for any values of "objective") in an open, and distributed manner is structurally not favoured by humanity's current incentive structure.

Why not just take humanity out of that picture, then?

With the current AI/deep learning hype everywhere, why not start developing an AI-driven search system?

I think for it to produce the most relevant results, it will need access to your browser (or be a browser) or better yet, work at the OS level, so it can have a better idea of the current context you're working in, and learn from your habits and preferences. Say I'm coding and have an IDE and a bunch of dev-related websites already open, so the AI gives more weight to development-related results. If I've been playing a certain game a lot then it should assume that I'll be looking for stuff related to that game. And so on.

So, the index would be globally accessible to all computers, but the ranking will be unique to each individual user.

Something like this could very well be the actual beginning of a true A.I. "butler," more so than Siri and whatnot.


Obligatory XKCD: https://xkcd.com/810/


Perhaps something built on top of http://commoncrawl.org/ ?

I hope "OpenSearch" becomes a thing, like OpenAI.


Something like http://commonsearch.org ?


The second paragraph seems to describe http://yacy.net/


Isn't this how https://www.majestic12.co.uk started?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: