Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: I built a LLM-powered Ask HN: like Perplexity, but for HN comments (hackersearch.net)
92 points by jnnnthnn on May 16, 2024 | hide | past | favorite | 39 comments
Hi HN!

I'm Jonathan and I built Ask Hacker Search (https://hackersearch.net/ask), an LLM-powered version of Hacker News' Ask HN.

Unlike Ask HN, Ask Hacker Search doesn't solicit new contributions from HN readers. Instead, it leverages Hacker News' historical data to answer questions, and offers LLM-generated summaries of those. I've used it for questions like "Should I use Drizzle or Prisma?" or "What is a good screen capture that allows easy zooming effects on Mac?".

It is particularly useful when you're interested in understanding HN readers' sentiment about a topic, or when looking for expert insights on topics of interest to HN readers. I've been using it continually while building it, and have found it particularly useful to find software libraries recommended by HN or get quick vibe checks on hot topics.

This builds on my release of Hacker Search two weeks ago (https://news.ycombinator.com/item?id=40238509), which offered a semantic search engine over top HN submissions. It's not just a small upgrade: covering comments was the #1 requested feature after that launch, so I rebuilt the near entirety of the product to support that.

Please try it out and let me know what you think of it! I have to limit the number of LLM summaries each person can get for free, as this is entirely self-funded. If you hit the limit, you can subscribe for more summaries generated by a better model ($8/month), or bring your own compute by running inference on Ollama on your machine!



Well done. Matching the HN style with Tailwind is a nice touch.

One thing I've been wanting for with search in general, especially LLM-powered, is having some kind of date relevance - especially with the fast moving world of technology.

For example I want to know how ProductX and ProductY compare. Last year ProductY didn't have FeatureZ, but they implemented and announced it last month. There might be several comments lamenting the lack of FeatureZ from 2 months ago, but they shouldn't be considered with the same weight now that it does exist.

I don't have any ideas for how this should be done but it's something I'd like to see tackled in RAG systems in general, and wanted to put it out there.


Yeah, that'd be a really nice thing to have! My current (and naive) approach is to just limit to the last 3 years of data. That said, it should be easy to add a date filter as a first step, and potentially have a more ingenious approach for letting the LLM reconcile contradictory statements based on the time at which they were posted.


My UX feedback is that I had a natural inclination to click on one of the response comments to my query and expected to be redirected to that hn post and if possible scrolled to the comment. I know why you might not want that but as long as you’re using query Params and not doing anything weird with browser history I’d almost certainly toggle back.

I could definitely use this as an alternative to hn algolia for some things but it’s going to be hard for me to remember. I’d recommend doing this as a browser extension or something so it’s not 100% lost in the void.


Thanks for the feedback! That feature is there, just not discoverable enough!

Try clicking on the relative time (e.g. "over 1 year ago"). Should take you immediately to that comment on HN, and you can then hit back in your browser to return to Hacker Search.

Agreed on the extension. That's coming next!


I'm kind of late to this post, but really awesome initiative and well executed project! Thank you for bringing this to people!

I tested it a bit and it seems pretty decent, although for some really niche theoretical questions it wasn't successful in retrieving the answers I wanted even if alot of the results were really good in other aspects. It could simply be because the answer is not available anywhere in hackernews.

I'm wondering if someone were to build a similar project but for other sites, what would your advice be? For instance what technical difficulties did you stumble on that you think would be good to be aware of?

Thanks in advance and once again congratulations on the project!


Thanks for the kind words!

Yes, the underlying dataset very much conditions the quality of the responses. Additionally, the retrieval strategy is also a really important factor (and that is something which I haven't had time to extensively optimize).

I'm writing a blog post that will answer your questions! Will post it here when it's fully baked.


Awesome, looking forward to it!



I tried a few different searches and it has blown me away at how good the summaries are! Incredible execution!

Personally I like that it doesn’t look like a cookie cutter 2024 templates site and has a more raw feel but I’m not sure that’s the best way to gain mass users these days.

10/10 for me

I’m impressed and most of the time I’m disappointed in the show hn section


I’m so glad you like it! Thank you for taking the time to write this kind note.


This is very cool. I wouldn't make buying decisions off of this, but it is a good starting point to get a pulse on the developer zeitgeist on any given topic.


I think you hit the nail on the head when you say, "get a pulse on the developer zeitgeist on a particular topic." I really enjoyed using it to get a sense of what developers feel about a particular topic. I also learned about new products I wasn't aware of.


Yay! Thank you for trying it out, so glad it's been useful!


Always cool seeing stuff in this space.

Regarding "zeitgeist", about a year ago I built something similar called https://zeitgaist.ai which also incorporates other sources like Mastodon, Bluesky, some subreddits etc.


Thank you for trying it out!


Very cool application of LLMs to supercharge the HN search experience. I’m bookmarking it!


Thank you!


Well done! I was also working on something similar and created a POC, but this is super nice.


Thank you! And... got a link to share? :)


I got some feedback that GPT-3.5 was letting people down so improved my caching strategy and defaulted everything to GPT-4o!


Great to hear! Would you be willing to share a bit on what changes you made to your caching strategy to optimize costs?


I made the caching window substantially longer (2 days instead of a few minutes) upon noticing most queries weren't about current events, and fixed a bug in the cache resolution logic that'd lead to a substantial number of cache misses.


Cool, thanks!


This is awesome! The comment filtering feature is a nice touch. Do I need to sign up to try local inferencing?


No. It will get offered to you once you exceed the monthly GPT-4o limit (currently 5 to avoid breaking the bank).

GPT-4o outperforms all local models so I figured using it as a fallback was the right approach :)


Hey, great work! Definitely will look forward to using.

Can I ask what the tech stack is? Is it open source?


Hey! Not open source, largely because I don't have time to make it good enough for me to feel comfortable sharing what's otherwise pretty scrappy code, but I'm planning a blog post detailing how it was built.

Tech stack is https://news.ycombinator.com/item?id=40238913 + the addition of turbopuffer for the new functionality.



From the YC legal page

>Except as expressly authorized by Y Combinator, you agree not to modify, copy, frame, scrape, rent, lease, loan, sell, distribute or create derivative works based on the Site or the Site Content, in whole or in part,

>In connection with your use of the Site you will not engage in or use any data mining, robots, scraping or similar data gathering or extraction methods.

Who owns the posts that you are trying to profit from?


Hey! This leverages the official HN API (https://github.com/HackerNews/API), no scraping involved. I don't think it's my place to opine on "who owns the posts".


Thanks, the site is blackholed here at the office.


this is cool; i like that you used it continuously while building it/ built it for your own needs. are there any kinds of searches it does particularly well at? any fun hot takes come up in your summarizers?


I feel that it performs really well on queries like those on the landing page, that generally have to do with understanding HN's sentiment about something or finding resources to learn about a topic.

As to your second question, someone looked up “What are some famous Google office pranks?” (https://hackersearch.net/ask?q=What%20are%20some%20famous%20...?) earlier today and I found some gold in there. Some of the hot takes on Gary Marcus have cracked me up too (https://hackersearch.net/ask?q=what%20do%20you%20think%20of%...?).


I dig! Something like this would be cool to see for reddit too


Perplexity offers a Reddit "focus" which does exactly that! https://www.perplexity.ai


this is cool. which tech stacks do you use.Is it open source?



Down


Thanks for flagging! Actively debugging, looks like one of my DB vendors is having some uptime issues. If you retry a couple times you should (eventually) get lucky.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: