I always wonder if the days of search engines for specific topics could return. With LLM's providing less than accurate results in some areas, and Google, bing, etc being taken over by adverts or well organised SEO, there feels like a place for accurate, specialised search.
Yeah, the (relative) rise of Kagi and Marginalia show that from a technical perspective, this is within the grasp of a dedicated hobbyist.[1] If Google continues their current trajectory, and overwhelming numbers of AI crawlers don’t cause an unsurmountable rise in CAPTCHA pages, I hope to see an upsurgence of niche search engines that focus on some specialty small enough that one or a few people can curate the content and produce a much better experience than the current crop of general Web search engines.
Self-plug: I run such a search engine (for programmers) in my living room, at <https://search.feep.dev/>. I don’t spend a ton of time maintaining it, so I’m interested to see what someone really dedicated could do.
Please, Kagi doesn't even have 50,000 active members, it's definitely not "rising" to become a serious contender at any sort of market share, it's a micro-project. You just feel it's bigger than that because for some reason all of its 50,000 users post relentlessly about it on HN.
Just gotta build a search engine that properly contextualizes scams, bait & switch sites, SEO, and the rest, and you're back in business.
To do that, you probably still need humans to properly curate the dataset, essentially hire 100 librarians and setup a work flow for them to continually prune results.
Right now, everything is all batch processes. None of these LLMs use active feedback since there's no real models using updates.
i know the answer is never distributed services, but if one could build a sufficiently complex SDK to make like a Blue Sky but for niche search indexes, you could chain a bunch of vetted resources together.
WestLaw and Lexis Nexis provide this for legal search, but quite frankly, these services are subpar. It's amazing that these two companies rake in hundreds of millions but they are both slower than Google, Bing, Yandex, or any LLM service (ChatGPT, Claude, Gemini, etc.) while scouring a universe of text that is orders of magnitude smaller. The user experience is also terrible (you have to login and specify a client each and every time you attempt to use the service and both services log you out after a short -- in my opinion -- period of inactivity, creating friction and needless annoyance to the user). There's an opportunity there.
LN and Westlaw's real service is their ubiquity. Every law student has access to it and every firm expects proficiency. While they generally suck, the last time I used it (looong time ago), their boolean search was quite nice. That kind of text search has mostly been replaced by non-deterministic black boxes which aren't great for legal research.
They've also got the Microsoft effect going on. Usually at least one of their products like their personal information aggregator used for locating people (like when serving lawsuits) is mandatory for a firm so it's just easier for them bundle everything else in.
If you want it digitized, yes, odd as that seems. You can go find individual prints of it or perhaps digital copies of opinions elsewhere, but those are also technically copyrighted in a lot of cases too.
In some jurisdictions, like Ontario, there are secret agreements that only allow 3 organizations to have digital access to Case Law (https://www.cameronhuff.com/blog/ontario-case-law-private/). This says a lot about our society, and how much we still have to improve.
I haven't personally used the mentioned services as they aren't in my field, however what is the accuracy of their results? Are they double checked? I don't find LLMs particularly accurate in my field (that's being kind), if anything I find they make up sources that simply don't exist.
I mean poor UX has no excuse but slow speed can be reasoned if it makes the quality of the service better.
Wikipedia is useful up to a point for sure. I feel whether it could be a expansion of Wikipedia in it's current use case, but for emerging research and niche topics it can sometimes be less useful.