Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why isn’t there a distributed, decentralized or open index that all of these startups can utilize? I understand that these startups are all are focusing in on different problem areas, but doesn’t it make sense to have something like open street maps so that all of these companies can share their compute resources in order to maintain something competitive with the big guys? Or even if it’s not fully decentralized these startups teaming up to build a bigger index for themselves makes a lot of sense to me.

I have no knowledge of this field but something like that would seem seem to make sense.



Yacy is still around. While I wouldn't want to disrupt it's decentralized/p2p nature, I think there's a case to be made for a community-managed central aggregation server to help seed the index at various snapshots. I might even be interested in helping run such a thing.


A shared index would surely be nice (Common crawl is perhaps an example of one that could be used) but say you had 10 search engines running from it. One decides a page is very important and updates constantly, so should be fetched every 30 minutes. Another search engine decides a page is spam and doesn't need to be recrawled. There's backend choices that affect the shape and crawl directions of the index.

Then things like whether the crawler should render the page (Using the end DOM content rather than the original source), does it do any tokenisation of the content, store other metrics etc, or does that need to be done by the end search engines.

Also there's issues with crawling Reddit, sites behind Cloudflare etc that others have went into more detail on this comment page.


Pretty much exactly what I have been thinking lately. Write about it recently here: https://nadh.in/blog/decentralised-open-indexes/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: