Yes, download data, create indices on your data yourself as you see fit, execute SQL queries.
If you don't have the resources to do so yourself, then you'll have to trust something, in order to share the burden.
If you trust money, then gather enough interested people to share the cost of construction of the index, at the end everyone who trust you can enjoy the benefits of the whole for himself, and you now are a search engine service provider :)
Alternatively if you can't get people to part with their money, you can get by needing only their computations, by building the index in a decentralized fashion. The distributed index can then be trusted at a small computation cost by anyone who believe that at least k% of the actors constructing it are honest.
For example if you trust your computation and if you trust that x% of actors are honest :
You gather 1000 actors and have each one compute the index of 1000th of the data, and publish their results.
Then you have each actor redo the computation on the data of another actor picked at random ; as many times as necessary.
An honest actor will report the disagreement between computations and then you will be able to tell who is the bad actor that you won't ever trust again by checking the computation yourself.
The probability that there is still a bad actor lying is (1-x)^(x*n) with n the number of times you have repeated the verification process. So it can be made as small as possible, even if x is small by increasing n. (There is no need to have a majority or super-majority here like in byzantine algorithms, because you are doing the verification yourself which is doable because 1000th of the data is small enough).
Actors don't have the incentive to lie because if they do so, it will be exposed provably as liars forever.
Economically with decreasing cost of computation (and therefore decreasing cost of index construction), public collections of indices are inevitable. It will be quite hard to game, because as soon as there is enough interest gathered a new index can be created to fix what was gamed.
If you don't have the resources to do so yourself, then you'll have to trust something, in order to share the burden.
If you trust money, then gather enough interested people to share the cost of construction of the index, at the end everyone who trust you can enjoy the benefits of the whole for himself, and you now are a search engine service provider :)
Alternatively if you can't get people to part with their money, you can get by needing only their computations, by building the index in a decentralized fashion. The distributed index can then be trusted at a small computation cost by anyone who believe that at least k% of the actors constructing it are honest.
For example if you trust your computation and if you trust that x% of actors are honest :
You gather 1000 actors and have each one compute the index of 1000th of the data, and publish their results.
Then you have each actor redo the computation on the data of another actor picked at random ; as many times as necessary.
An honest actor will report the disagreement between computations and then you will be able to tell who is the bad actor that you won't ever trust again by checking the computation yourself.
The probability that there is still a bad actor lying is (1-x)^(x*n) with n the number of times you have repeated the verification process. So it can be made as small as possible, even if x is small by increasing n. (There is no need to have a majority or super-majority here like in byzantine algorithms, because you are doing the verification yourself which is doable because 1000th of the data is small enough).
Actors don't have the incentive to lie because if they do so, it will be exposed provably as liars forever.
Economically with decreasing cost of computation (and therefore decreasing cost of index construction), public collections of indices are inevitable. It will be quite hard to game, because as soon as there is enough interest gathered a new index can be created to fix what was gamed.