Hacker Newsnew | past | comments | ask | show | jobs | submit | mkauffman23's commentslogin

We spent a lot of time at ragie.ai getting agentic retrieval right. Like a lot of stuff these days, the demo-quality version came together quickly, but the real work was making it reliable across domains with messy input data, and getting it to refuse instead of hallucinate when the source data wasn't sufficient to answer a query.

Classic RAG (embed and fetch) breaks down on compositional or scoped questions. Our approach treats retrieval as reasoning with multiple subagents in a loop: Plan → Search → Answer → Evaluate → Cite. This agent loop decomposes queries, chooses search strategies dynamically, and inspects intermediate results before responding.

I wrote up some details on how we implemented this and why. Hopefully some useful bits in there for anyone working on agents. Happy to answer any questions!


I have no idea how accurate the reconstruction would be but it would make for a wild experminent!


In this blog we detail the api design and technical decisions we made when adding audio video support to Ragie's RAG service. We explore some of the approaches we tried and the rationale behind what we landed on. Worth a read if you're building similar systems.

Here's a TLDR: - Built a full pipeline that processes audio/video → transcription + vision descriptions → chunking → indexing - Audio: faster-whisper with large-v3-turbo (4x faster than vanilla Whisper) - Video: Chose Vision LLM descriptions over native multimodal embeddings (2x faster, 6x cheaper, better results) 15-second video chunks hit the sweet spot for detail vs context - Source attribution with direct links to exact timestamps

Happy to answer any further questions folks might have!


Source attribution with direct links to exact timestamps is truly unique when it comes to A/V RAG solutions.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: