Hacker News new | past | comments | ask | show | jobs | submit login

Apparently, they have their own search index, which they say covers ~95% of queries, and if the results aren't in the index, it will then get it from Google or Bing.

I'd love some more details on how this works. They probably aren't scraping the whole web. Are they just mirroring Bing and Google indexes? They seem to have their own page ranking algorithm that they're hoping to get trained.




I found the announcement blog post. Brave Search is a rebranding of Tailcat's product, which Brave acquired in March.

https://brave.com/brave-search-beta/


@dang, this seems like a good candidate to replace the current link of this post.


In December '19 the company that would end up being acquired by Brave did a number of blog posts [0] where they explained the tech. The short answer is 'a lot of word2vec'.

[0] https://www.0x65.dev/


Funny how their posts show such a different approach to their browser than Braves. E.g. forking Firefox not Chromium, implementing functionality as extension instead of in browser where possible...


> If all browsers end up using Blink (Google), the Web will suffer as developers will only optimize and test for the Blink rendering engine.

Am I the only one that thinks that this would be a good thing? Like the entire industry sharing the same core open source technology? Write a website once and it works perfectly across all platforms?


No monopoly or monoculture, even if open source, is good. It is not just about the features that you think makes your life better, you have also to consider the potential catastrophic bugs that could be exploited and leave everyone without an alternative.

Evolution only happens when there is divergence and competition.


Well we're already effectively at monoculture. The only other rendering engine that has any meaningful market share is Apple's fork of Webkit.

I disagree that with the proposition that a single open source project with broad industry representation would hinder evolution.

Companies like Google and Microsoft don't compete on their ability to support the various web specs, but on quality of the web applications they can deliver to customers. Competition in this space will continue to drive innovation even with a single agreed upon rendering engine.

It would, however, limit the ability of a company like Apple to hobble their only-supported browser such that web apps can't compete with native ones.


> market share

If there is at least one alternative that is significant, then by definition it's not "effectively a monoculture".

In any case, market share is the least important factor here. As long as we have an intolerant minority (https://news.ycombinator.com/item?id=27262240) that does not want be subject to Chromium as the only open source project, we will be fine.

Mozilla is screwing up badly, and I switched to Brave mostly because I believe that they are building a stronger artillery to fight surveillance capitalism (as in, Mozilla gives you wishy-washy feel-good words, Brave gives you money), but this does not mean that Mozilla needs to go away. Quite the opposite: I still hope that we see a "Next acquires Apple for negative $400 million" story. If Firefox builds integration with Brave's network and also adopts BAT, I would go back to it in a heartbeat.

> I disagree that (...) would hinder evolution. Competition in this space will continue to drive innovation even with a single agreed upon rendering engine.

God, no! The worrying thing is not that the development of the web specs would stagnate. The problem is that the development would only happen in the direction that benefits them and that they would be completely unchecked.


> Evolution only happens when there is divergence and competition.

Not when we can a have a logic stable and well made standard. Like the metric system. I am pretty sure that I would have a problem with any alternative to the metric system. The more we are to use it the better it become.

Evolution can and thrive through cooperation and mutual aid. I would be fine with having one standard implementation of a browser engine if it was not rule by a greedy corporate like Google, Apple or Microsoft.


The problem there is that metric basically equates to math at the end of the day, I don't think those are directly comparable. It would be very strange (and new) to have one and only one implementation of an entire class of software, instead of a technical standard with multiple different implementations.


Chromium is nominally open source - in practice it's controlled by Google employees in any way that matters. So you would literally be handing full control over the web-experience to Google.


Nothing would stop it being forked. If for example, Microsoft wanted something added then they could fork it; add their code and use that in Edge.


Nothing stops them but considering they've gone the route of "If you can't beat 'em, join 'em" after fighting for so long, I doubt they're likely to diverge significantly at this stage.


Problem is, this means ceding what amounts to control of the browser, and so, the internet experience, to a privacy invasive megacorp.

Were it a nonprofit trust, I'd be right there with you. But not a for-profit company, and sure as HELL not Google.


You surely are too young to remember the IE6 shithole monopoly we were all in when - MOZILLA - ALONE saved us.

It's not that having a common rendering engine for everyone would be bad. What's bad is having 1 company (or a few) for-profit companies controlling that rendering engine.

They don't give a shit about you!


> all platforms

Chrome doesn’t even support all platforms. It probably wouldn’t run on my car’s display for example. If the web followed an open standard that wouldn’t be a problem: the car manufacturer could make their own browser.

And not everyone wants to use chrome/blink because the development is in practice entirely run my google who do not have consumer interests in mind.


Depends mostly on the terms you choose to evaluate it. Certainly that is one upside, but the downsides are pretty clear as well. Chromium is an open-source base but it is still very much spearheaded and dragged by the whims of Google, and if there is no other game in town they have even less of a reason to avoid decision that might not be in the best interest of users.

Additionally, you lose even more meaningful competition that drives improvement. Obviously you don't lose it all as your different chromium flavors do implement different bells and whistles but those are much narrower in scope.

It continues to be ironic and concerning that Firefox exists at the whims of Google paying to be the default search, but in some ways that helps them with potential antitrust cases as well. I think we have a lot more to lose than gain if we fully collapse into a Blink/Chromium singularity.


This site works best in IE6 at 800x600.


Those were the days! I was just explaining to my children yesterday how computer games were played in 640x480 when I was their age. I remember designing websites for that resolution too; still impresses me what we were able to do with so few pixels, lol.


Sounds like they built their index by collecting Google search queries done by the users of their browser extension. Suddenly Brave's "completely independent index" doesn't sound quite as impressive.

https://www.0x65.dev/blog/2019-12-05/a-new-search-engine.htm...


why is Brave calling them Tailcat? The company was Cliqz, not Tailcat.


Cliqz had closed last year. The team went on to create a new product called Tailcut, which Brave acquired.


I think Cliqz's search engine was called Tailcat


Word2vec has its limitations... I assume by now they've trained their own GPT-3-like model on the data...


So, note that Brave brought in the Cliqz/Tailcat team to build this: While it's a "new search engine", I'm guessing the data and algorithms they were working on previously have all made it into this project at some point. Cliqz launched in 2015, so there's a number of years of work put in.


I would also highly recommend the blog post series [1] from Cliqz talking about the tech behind the search.

[1] - https://0x65.dev/


I was involved with the cliqz search engine and used their browser for a while. Great people with excellent integrity.


Very sad about what happened. Had very high hopes for you guys (more than from brave).

I know TailCat is open source... But what about Brave Search? Any clue?


Check this recent podcast with Brave‘s founder, where, among other things, he is talking about how the search is implemented: https://podcasts.apple.com/de/podcast/modern-finance/id13386...



Never heard of this show before, but it got a new subscriber out of me. Thanks for the link to this!


I still wouldn't use it since it falls back to Bing or Google.


You're referring to Fallback Mixing, which is off by default. You have to enable it in https://search.brave.com/settings. When enabled, this feature will (at times) pull in results from Google via an anonymous query, routed through the browser. Read more about it here: https://search.brave.com/help/google-fallback


> Note that choosing this option has no effect on your privacy. If you happen to have a Google account, Google will not be able to associate your query with this account.

I'm confused about "routed through the browser" -- is the browser talking to Google directly, but without sending the login cookies, and then hoping Google doesn't associate searches from your IP with your identity?


Correct, a query is issued from your browser but without any cookies. While it's true your IP address tags along for the ride, the IP address isn't typically how users are tracked on Google-scale properties. Due to NAT and more, your IP address is not exclusively yours. It can represent many people at once, and over time. That said, if you are not comfortable with the idea of Fallback Mixing, you do not need to enable the feature.


At the very least I suggest modifying the text on the page as it is misleading.

  > Note that choosing this option has no effect on your privacy.
IP address is definitely considered private information even at court level.

https://www.whitecase.com/publications/alert/court-confirms-...


This has not been my experience. Comparing results with Google, Startpage, and a Searx instance with only Google enabled reveals that the results are almost always from Google. Sometimes they merge multiple results that share a domain.

I decided to add them to the "Semi-Independent" category of my collection of indexing search engines: https://seirdy.one/2021/03/10/search-engines-with-own-indexe...


Mixing with Google results only can happen after opt-in and only in Brave browser. You can see if a single query has been mixed clicking on the `Info`, or check the independence metrics on the `Settings` tab.

The fact that you see results similar to Google for popular queries is a by-product of the fact that our ranking is trained using anonymous query-log. There is plenty of references to the methodology (https://0x65.dev/).

The fact that we are similar to Google on certain types of queries, is good (at from the perspective of human assessment). It's easy to find other types of queries for which we are not similar to Google. It would be rather stupid if we were to "use google" on easy to solve queries but not on the complicated ones, don’t you think? In any case, very nice article besides a couple of miss-conceptions (like this one), will bookmark.

Disclaimer: work at Brave search, used to work at Cliqz


That makes a bit more sense; I just read the blog posts. I'm concerned about the effects of optimizing against Google (namely, the extremely similar results); I don't think I understand the point of an alternative if it tries to replicate a competitor to this degree. The whole idea I was going for in that article was a diversity of information sources: if one engine isn't giving the results you want, try another.

Right now, users who want Google results and privacy can use a Searx instance or Startpage.

I updated the article to fix the inaccuracy. Diff: https://git.sr.ht/~seirdy/seirdy.one/commit/ddeeb36248ce5318...

Any other fact-checks are welcome.


You bring a very good point on the diversity of information sources, which is something we plan to attack in the near future with open ranking [0]

In my opinion having similar results to Google will facilitate adoption. After all, Google is pretty good for many types of queries (not all), and people in general have strong habits.

The fact that we are similar with our own index is great. It means that we have the power of deviating from it when needed, as we mature/evolve.

Allow me to repurposed your statement on why not use startpage if you want Google-like results: if tomorrow Google disappears (or for some reason becomes unusable), brave search will continue to operate as normal (similar to old Google). What will happen to searx or startpage? What till happen to ddg or swisscows if the provider turning bad is Microsoft. IMHO, no matter how much reranking or nice features they you put on top, unless you do not control the search results themselves, diversity can only be superficial.

Sorry for the "rant". Thanks a lot for the inputs and for updating the doc, appreciate it.

[0] https://brave.com/wp-content/uploads/2021/03/goggles.pdf


Brave Search doesn't fall-back to Google; not unless you have enabled Fallback Mixing in https://search.brave.com/settings/. Brave Search has its own index; the results may resemble those of other engines at times, but they aren't pulled from those engines (again, noting the exception of Fallback Mixing, an optional feature offered to the user via Settings).


I'm testing on Firefox and the Tor browser right now, JS disabled. I also disabled cookies in Firefox. Searches for "Seirdy", "Neovim", "gccgo", and others return results identical to Google, Startpage, and Searx instances with only Google enabled. No other independent engine of all the 25 other English independently-indexing engines I compared in the article has had this happen; identical pages on all the other engines are nearly impossible to find for advanced/uncommon queries.

90% of queries being identical to Google but different from the 25 other independent engines is one hell of a coincidence.

Archived example:

Brave results for "gccgo": https://web.archive.org/web/20210622172743/https://search.br...

Google results for "gccgo" (proxied through Startpage): https://web.archive.org/web/20210622172939/https://startpage...

If this is a bug, it's very serious and needs to be publicly disclosed.

Edit: more examples:

Brave results for "oppenheimer": https://web.archive.org/web/20210622173647/https://search.br...

Google results for "Oppenheimer" (proxied through Startpage): https://web.archive.org/web/20210622173658/https://startpage...


As a counterexample, I searched for something very obscure (only three pages on startpage) expecting to see them pulling in results from startpage to cover the long tail. I was surprised to see different results, suggesting their index is much larger than I assumed.

The query was "retail snap incentive program"

Edit: All your queries are for relatively popular terms. I wouldn't be surprised if there's just a clearly right top set of pages.


> I wouldn't be surprised if there's just a clearly right top set of pages.

I would be astounded! Why would DDG, Bing, etc. not use it? Different search indices and engines should practically always have differences in results, as ranking results is very fuzzy and dependent on the available data.


Interesting. I couldn't reproduce those results. Certain queries did produce _very_ identical results, but others did not. In some of those cases Google and Startpage did better.


Even semi-independant seems generous. I probably would have just lumped them in with Google or Bing.


Some queries do actually return independent results, but the vast majority (in my experience) do not.


I don't see a fallback mixing option on that page. Is it called 'Fallback Mixing' on that settings page? Also, these results are pulled from google and bing it seems for every query I do. seems like maybe some reranking is happening. And the query completions are from Bing. So you are sending everybody's queries to third parties. Not very private.


It does not appear that they are exposing all possible settings configs on mobile as fallback mixing is not shown as an option for me there. This seems like an oversight to me.


Fallback Mixing is only available to Brave on desktop and Android at this time. Apologies for any confusion.


Why is it only available on Brave? Doesn't make any sense.


Because you cannot issue a cross-site request to Google from the client due to CORS policies. This feature required work in the Brave browser itself, so that the application would serve as a pipeline for the request on behalf of the search page itself.


What incentive do Google and Bing have to share free SERP data to Brave in an anonymous channel?


They aren't sharing it with Brave directly, but rather with users. The query is issued via the participating user's Brave instance. This data then supplements what Brave Search has found, and assists Brave Search in presenting better results to that user, and others, in the future.


This sounds like a dishonest way of bypassing payment for Google search API by impersonating a request from a user.


It's still a request from the user; the user consents to issuing these requests on behalf of Brave Search when they opt-in to Fallback Mixing. Anybody can issue calls to Google's search engine.


Doesn't this get the user directly in violation of Google's TOS which prevents automated queries against it?


What do you think the Google custom search api is supposed to be used for if not serving searches that originated with some user?


I just tested an image search on Bing (https://www.bing.com/images/search?q=test) and Brave Search (https://search.brave.com/images?q=test) and it definitely appears that Brave is falling back to Bing as the results are highly identical, especially compared to Google (https://www.google.com/search?q=test).


They mention that image search is 100% bing. Not sure if this is planned to be replaced by their own implementation later.

"However for some features, like searching for images, Brave Search will fetch results from Microsoft Bing."


Indeed, we lean heavily on Bing for image search. With time and maturation, this will change I'm sure. That said, when Bing lacked "tank man" results recently, Brave Search still yielded results (although the quality wasn't what we'd like to see; still a beta product. Screenshots here: https://twitter.com/BraveSampson/status/1400926207416410113). Crawl, walk, run. We're just getting started


Presumably the fallback happens server side, and presumably the google/bing queries are cloaked so your IP isn't making it to google/bing.

Curious why you wouldn't use bing/google even if your queries are "proxied" through Brave servers? (Assuming Brave isn't also sending your IP, etc, when they submit the query to google/bing)


Fallback can be turned off with an easy toggle in the settings: https://search.brave.com/settings


If that is the case, what search engine do you currently use?


What do you use? Doesn't DDG use Bing as well?


What search engine would you use then? This is what pretty much every alternative search engine does...


Try our new Google alternative!

* Powered by Google




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: