your english isnt the greatest but the points are reasonably sound.
in particular discovery is terrible - its worst - much worse than the altavista days. People don't seem to realize that 99.9% of the people see 0.0..1% of the content, always the same content, for all.
a browser that block all ads, trackers, etc can indeed provide the user data for a fee, since the browser always has access to all the data. not sure how it would access its own crawl index though, ie where does the index comes from?
Right now, its google...
There are open crawl sets available. English is poor because I am on mobile (which is why i am not going to provide the links) one is called, i think, open crawl index.
However, like googlebot, the browser will of course be the actual crawler. Page requests are cached at the databank level, then at the users partition.
The problem is that the web is an rss feed but sites (with valuable info) are blocking crawlers, except google. This creates informational asymmetry.
Since almost all search engines try to emulate page rank, we dont have diversified results, however all our search info is aggregated.
The browser wont "block" adds because it won't ever return websites. It will literally only return (how I imagine v1) to return html snippets and they can be iterated over rapidly.
Tracking won't matter. I haven't worked out exactly how to do it, but i think that you will own a piece of a corpus (essentially there is one corpus, but you have it sort of mirrored to your silo) you can make requests to the corpus to fetch data or to go out into the internet and get raw data. It is returned to your cache (and the global one) then your processing is done locally.
* browser is the feedreader, network and platform
* users sell bots and crawlers to users
* users sell sorted data sets to users
* users sell algorithims to users
* browser is a market maker
Storage so cheap processing power is so good a 20gb cache of data can sit locally. And you can fetch newer data or swap it out for other stuff. You also can store post-processed analytical data in the cloud
in particular discovery is terrible - its worst - much worse than the altavista days. People don't seem to realize that 99.9% of the people see 0.0..1% of the content, always the same content, for all.
a browser that block all ads, trackers, etc can indeed provide the user data for a fee, since the browser always has access to all the data. not sure how it would access its own crawl index though, ie where does the index comes from? Right now, its google...