Hacker News new | past | comments | ask | show | jobs | submit | brunohaid's comments login

You might want to look at https://typesense.org/ for that.


Noice!

Does anyone have a good recommendation for a local dev setup that does something similar with available tools? Ie incorporates a bunch of PDFs (~10,000 pages of datasheets) and other docs, as well as a curl style importer?

Trying to wean myself off the next tech molochs, ideally with local functionality similar to OpenAIs Search + Reason, and gave up on Langchain during my first attempt 6 months ago.


My company (actually our two amazing interns) was working on this over the summer, we abandoned it but it’s 85% of the way to doing what you want it to do: https://github.com/accretional/semantifly

We stopped working on it mostly because we had higher priorities and because I became pretty disillusioned with top-K rag. We had to build out a better workflow system anyway, and with that we could instead just have models write and run specific queries (eg list all .ts files containing the word “DatabaseClient”), and otherwise have their context set by users explicitly.

The problem with RAG is that simplistic implementations distract and slow down models. You probably need an implementation that makes multiple passes to prune the context down to what you need to get good results, but that’s complicated enough that you might want to build something else that gives you more bang for your buck.


Thanks for the excellent comment & insight!


Honestly you're better off rolling your own (but avoid LangChain like the plague). The actual implementation is simple but the devil is in the details - specifically how you chunk your documents to generate vector embeddings. Every time I've tried to apply general purpose RAG tools to specific types of documents like medical records, internal knowledge base, case law, datasheets, and legislation, it's been a mess.

Best case scenario you can come up with a chunking strategy specific to your use case that will make it work: stuff like grouping all the paragraphs/tables about a register together or grouping tables of physical properties in a datasheet with the table title or grouping the paragraphs in a PCB layout guideline together into a single unit. You also have to figure out how much overlap to allow between the different types of chunks and how many dimensions you need in the output vectors. You then have to link chunks together so that when your RAG matches the register description, it knows to include the chunk with the actual documentation so that the LLM can actually use the documentation chunk instead of just the description chunk. I've had to train many a classifier to get this part even remotely usable in nontrivial use cases like caselaw.

Worst case scenario you have to finetune your own embedding model because the colloquialisms the general purpose ones are trained on have little overlap with how terms of art and jargon used in the documents (this is especially bad for legal and highly technical texts IME). This generally requires thousands of examples created by an expert in the field.


Disclosure: I'm an engineer at LangChain, primarily focused on LangGraph. I'm new to the team, though - and I'd really like to understand your perspective a bit better. If we're gritting the wheels for you rather than greasing them, I _really_ want to know about it!

> Every time I've tried to apply general purpose RAG tools to specific types of documents like medical records, internal knowledge base, case law, datasheets, and legislation, it's been a mess.

Would it be fair to paraphrase you as saying that people should avoid using _any_ library's ready-made components for a RAG pipeline, or do you think there's something specific to LangChain that is making it harder for people to achieve their goals when they use it? Either way, is there more detail that you can share on this? Even if it's _any_ library - what are we all getting wrong?

Not trying to correct you here - rather stating my perspective in hopes that you'll correct it (pretty please) - but my take as someone who was a user before joining the company is that LangChain is a good starting point because of the _structure_ it provides, rather than the specific components.

I don't know what the specific design intent was (again, new to the team!) but just candidly as a user I tend to look at the components as stand-ins that'll help me get something up and running super quickly so I can start building out evals. I might be very unique in this, but I tend to think that until I have evals, I don't really have any idea if my changes are actually improvements or not. Once I have evals running against something that does _roughly_ what I want it to do, I can start optimizing the end-to-end workflow. I suspect in 99.9% of cases that'll involve replacing some (many?) of our prebuilt components with custom ones that are more tailored to your specific task.

Complete side note, but for anyone looking at LangChain to build out RAG stuff today, I'd advise using LangGraph for structuring your end-to-end process. You can still pull in components for individual process steps from LangChain (or any other library you prefer) as needed, and you can still use LangChain pipelines as individual workflow steps if you want to, but I think you'll find that LangGraph is a more flexible foundation to build upon when it comes to defining the structure of your overall workflow.


> This generally requires thousands of examples created by an expert in the field.

Or an AI model pretending to be an expert in the field... (works well in a few niche domains I have used this in)


Don't forget to finetune the reranker too if you end up doing the embedding model. That tends to have outsized effects on performance for out of distribution content.


I am looking up chunking techniques, but the resources are so scarce on this. What's your recommendation?


It’s the big unsolved problem and nobody’s talking about it. I’ve had some decent success asking an expensive model to generate the chunks and combining that with document location, and my next plan for an upcoming project is to do that hierarchically, but there’s no generally accepted solution yet.

RAG’s big problem is turning PDFs into chunks, both as a parsing problem and as the chunking problem. I paid someone to do the parsing part into markdown for a project recently (including table data summaries) and it worked well. MathPix has an good API for this, but it only works sensibly for PDFs that don’t have insane layouts, and many do.


The data source i have is a filesystem with docs, pdfs, graphs etc.

Will need to expand folder names, file abfeviations. Do repetative analysis to find footers and headets. Locate titles on first pages and dedupe a lot. It seems like some kind of content+hierarchy+keywords+subtitle will need to be vectorized, like a card catalog.


Not the person you asked, but it's dependent on what you're trying to chunk. I've written a standalone chunking library for an app I'm building: https://github.com/rmusser01/tldw/blob/main/App_Function_Lib...

It's setup so that you can perform whatever type of chunking you might prefer.


If there's a list of techniques and their optimal use cases I haven't found it. I started writing one for the day job, but then graphRAG happened, and Garnter is saying all RAG will be graphRAG.

You can't fight Gartner, no matter how wrong they are, so the work stopped, now everything is a badly implemented graph.

That's a long way to say, if there is a comparison, a link would be most appreciated


Semantic chunking is where I would start with now. Also check this out: https://github.com/chonkie-ai/chonkie


> but avoid LangChain like the plague

Can you elaborate on this?

I have a proof-of-concept RAG system implemented with LangChain, but would like input before committing to this framework.


LangChain is considered complicated to get started with despite offering probably the widest amount of functionality. If you are already comfortable with LangChain you are free to ignore that.


I've had great luck just base64'ing images and asking Qwen 2.5 VL to both parse it to markdown and generate a title, description and list of keywords (seems to work well on tables and charts). My plan is to split PDFs into pngs first then run those against Qwen async, then put them into a vector database (haven't gotten around to that quite yet).


How does the base64 output become useful / usable information to an LLM?


No idea but Qwen 2.5 VL seems to understand it all quite well.


Why avoid Langchain?


Continue and Cline work with local models (e.g. via Ollama) and have good UX for including different kinds of context. Cursor uses remote models, but provides similar functionality.


Appreciated! Didn’t know Cline already does RAG handling, thought I’d have to wire that up beforehand.


I'm sorry trying to clarify - why would you use Cline (which is coding assistant) for RAG?


I may have misunderstood, but it seems the OPs intent was to get the benefits of RAG, which Cline enables, since it performs what I would consider RAG under the hood.


I’ve been working on something that provides document search for agents to call if they need the documents. Let me know if you are interested. It’s Open Source. For this many documents it will need some bucketing with semantic relationships, which I’ve been noodling on this last year. Still needs some tweaking for what you are doing, probably. Might get you further along if you are considering rolling your own…


Could I take a look at the repo? Thanks!


https://github.com/MittaAI/webwright

Let me know if you want to go over the code or want to discuss what works and what doesn’t. We had a loop on the action/function call “pipeline” but I changed it to just test if there was a function call or not and then just keep calling.


> gave up on Langchain during my first attempt 6 months ago

Why? If it's not a secret. I'm just looking for something, not sure actually what... :-\


Still surprised that the $3000 NVIDIA Digits doesn’t come up more often in that and also the gung-ho market cap discussion.

I was an AI sceptic until 6 months ago, but that’s probably going to be my dev setup from spring onwards - running DeepSeek on it locally, with a nice RAG to pull in local documentation and datasheets, plus a curl plugin.

https://www.nvidia.com/en-us/project-digits/


It'll probably be more relevant when you can actually buy the things.

It's just vaporware until then.


Call me naive, but I somehow trust them to deliver in time/specs?

It’s also a more general comment around „AI desktop appliance“ vs homebuilts. I’d rather give NVIDIA/AMD $3k for a well adjusted local box than tinkering too much or feeding the next tech moloch, and have a hunch I’m not the only one feeling that way. Once it’s possible of course.


Oh, if it's anything close to what they claim, I'll probably buy one as well, but I certainly do not expect them to deliver on time.


DIGITS isn't that impressive... It is a RTX 5070 Ti laptop GPU (992 TOPS, clocked less than 1% higher, to reach 1000 TOPS/1 PFLOP. As a reference RTX 5090 desktop have 3352 TOPS, more than 3x...), with 128 GB of unified memory.

Just because Jensen calls it a super computer and gives it a DGX-1 design, doesn't make it one.

In the Cleo Abram interview [1], Jensen said that DIGITS is 6 times more powerful than the first DGX-1.

According to this PDF [2], DGX-1 had 170 TFLOPS of FP16 (half precision). 170x6=1020 TFLOP (~1 PFLOP). Yes DIGITS is suppose to have 1 PFLOP, but according to the presentation, it should be in FP4...

He also said that it will draw 10k times less power. But DGX-1 had a TDP of 3.5kW [3] and I highly doubt DIGITS will draw 3500/10000=0.35W... the GPU alone will have a peak TDP that is more like 200 times higher than that.

I mean, we all know that NVIDIA does fudge the numbers in charts. Like comparing FP8 from last generation, to FP4 on this. But this is extreme.

Having said that. Do I believe that they can deliver a laptop (in another form factor) and it will perform 1 PFLOP of FP4. Of course! Like I said, it is nothing special. Both Apple and AMD have unified memory in relatively cheap systems.

1. https://youtu.be/7ARBJQn6QkM

2. http://images.nvidia.com/content/technologies/deep-learning/...

3. https://images.nvidia.com/content/pdf/dgx1-v100-system-archi...


Also, LPDDR memory, and no published bandwidth numbers.


Seeing as it is going to deliver 1 PFLOP, it will need to have similar speed as the "native" (GDDR) counterpart otherwise it will only be able to hit that performance as long as all data is in the cache...

My guess is that they will use the RTX 5070 Ti laptop version (992 TFLOPS, slightly higher clocked to reach 1000 TFLOPS/ 1 PFLOP).

Their big GB200 chips have 546 GB/s to their LPDDR memory, they could use the same memory controler on the GB10. They don't need to design a new one. It would still be slower than what they are currently using on the RTX 5070 Ti laptop GPU, but any slower than that, and there is no chance that they could argue that it would hit anywhere near 1 PFLOP of FP4. It would only be possible in extreme edge case scenarios when all data will fit in it's 40MB L2 cache.


I think you have the reasoning backwards, there's no "must" here. Historically there are lots and lots of systems which have struggled to approach their peak FLOPS in real-world apps due to off-chip bottlenecks.


and people are missing the "Starting at" price. I suspect the advertised specs will end up more than $3k. If it comes out at that price, i'm in for 2. But I'm not holding my breath given Nvidia and all.


CPU (20 ARM cores), GPU (1 PFLOP of FP4) and memory (128 GB) seems fixed, so the only configurable parts would be storage (up to 4TB) and cabling (if you want to connect two DIGITS).

We kind of know what storage cost in a store and we know that Apple (Mac computers) and every phone manufacturer adds a ton of cost for a small increase. NVIDIA will probably do the same.

I have no idea what the cost for their cabling would be, but they exist in 100G, 200G, 400G and 800G speeds and you seem to need two of them.

If you are only going to use one DIGITS, and you can make do with whatever is the smallest storage option, then it is $3000. Many people might have another computer (set up FTP/SMB or similar solution), NAS or USB thumbdrive/external hardrive where they can stor extra data, and in that case you can have more storage without paying for more.


I'm not sure you can fit a decent quant of R1 in digits, 128 GB of memory is not enough for 8 and I'm not sure of 4 but I have my doubts. So you might have to go for around 1, which has a significant quality loss.


You can connect two, and get 256 GB. But it will still not be enough to run it in native format. You will still need to use lower quant.


The webpage does not say $3000 but starting at $3000. I am not so optimistic that the base model will actually be capable of this.


They won't have different models, in any other ways than if you want more storage (up to 4 TB, we don't know the lowest they will sell) and cabling necessary for connecting two DIGITS (it won't be included in the box).

We already know that it is going to be one single CPU and GPU and fixed memory. The GPU is most likely the RTX 5070 Ti laptop model (992 TFLOPS, clocked 1% higher to get 1 PFLOP).


probably because nvidia digits is just a concept rn


Not a religious person, but there's something deeply spiritual about Katalin Karikó type characters, who throughout the ages stuck with their curiosities and craft for the craft's sake.


A note on "craft" here is that Heaviside's other thing was inventing a kind of magical math that seems like it shouldn't work (even though it did) and ignoring everyone who complained about it.

https://deadreckonings.com/2007/12/07/heavisides-operator-ca...

https://en.wikipedia.org/wiki/Operational_calculus


Tried skimming the page but couldn't find the answer: do we know if the neural connection impedance is perfectly matched? It looks quite organic in shape, with teardrop connections and so on, but curious how nature did that job?


It is not always perfectly matched, because the mismatches can actually have a "computational" purpose, but e.g. the typical branching pattern of dendrites is pretty close to being matched . There is a chapter on this in the Dayan and Abbot textbook.



Yes exactly. Chapter 6.3, though it is actually less detailed than I remembered.


Much appreciated! Maybe the impedance added some colorful garnish to your memory... :-)


If this is your thing, you might also want to check out Christof Koch's Biophysics of Computation. Cable theory is introduced in one of the first few chapters.


That's an interesting question. One follow-up question I would have is whether impedance matching is a relevant concept here, given that the model has no inductance (I'm guessing that's because the flow of charge is in the form of ions moving radially through the membrane. If neurons were more like transmission lines, would we be susceptible to interference from distant lightning?)

I also skimmed the page and saw that equation 20 is not a wave equation (as the article says, it is a diffusion equation.) Again, I am not sufficiently knowledgeable to say whether that renders the question of impedance matching moot.

Update: I see from the sibling thread and its excellent reference that the refractory period, where the sodium and potassium ions are being pumped back to their starting positions, suppresses reflection.


Sibling thread?


Sorry if that's not clear - I was referring to phreeza's reply to your question and the link you had posted below it, which is as far as the discussion had gone at that time. The refractory period, and its role in suppressing reflections, is mentioned in the reference you provided a link to.


Got it - haven't read it yet but, also thanks to your pointer, very much looking forward to!


Excellent post!

Controlled impedance took me a long time to wrap my head around when starting PCB design, the moment when it finally clicked was watching this excellent AlphaPhoenix video https://www.youtube.com/watch?v=2AXv49dDQJw& asking and practically demonstrating the simple question:

When you flip on a switch, to turn on anything, send data, morse something etc, how does the circuit know how much current the load at the other end needs?

Spoiler: Given that information can't travel faster than light, the simple answer is: it doesn't. So it just guesses and adjusts, which you don't want as it gives you exactly the ringing etc Heaviside identified. The video is a nice complement, as it perfectly visualizes the issues at play.

It's a pretty wild bit of understanding to have, even in simple situations like flipping on a light switch.


That video is good for the water like wave explanation that is a very useful lens. If you want a more in-depth explanation, particularly how the field is in the dielectric and the wires/traces are simply the wave guide, this long presentation by Rick Hartley will help move to the next level.

https://www.youtube.com/live/ySuUZEjARPY

The dramatic shift in behavior above the audio frequency range is where the water wave lens starts to fall down IMHO.

I was looking at my brothers memory card from a Cray 1a the other day and that video popped in my head. They had the timing traces snaking through several flat-pack chips legs. No wonder they had to move from parity to Hamming code even with exclusively using differential twisted pairs between modules.


That one‘s gold too, but for me it was the other way round - needed Hartley to fully grasp Alpha Phoenix.

Understanding waves feels a bit like the bell curve meme for me: you start with the mental water model, and eventually end up with it again.

Or Feynman: you hear him helpfully talk about bouncy rubber balls, then learn a bunch of stuff over the next decade, and randomly listen to the same lecture again, and suddenly all sorts of „aaaah, that’s what he meant“ lightbulbs go off.


I should also clarify something above, Oliver Heaviside did discover the energy flows through the dielectric, most explanations like that video use other lenses to communicate the very real need to consider voltage and current.

The original link side stepped that as to be honest it is to complicated for the intended use case.

All models are wrong, some are useful, and the water wave model is very useful for very real needs.

I personally wasted a lot of time confusing the map for the territory, but yes everyones path will be different. I confused the "electron flow" and water wave model as being absolute ground truth for way longer than I would like to admit.


Watching the Alpha Phoenix video I had a sort of realization that waves (as phenomena in general) are basically nature’s calculator/probes. If nature doesn’t “know” what will happen there’s probably some wave involved to figure it out.


And bias, especially in science and politics travels as a wave through time like a pendulum. Solving these systems, with the resistance measured in human lives, can take many generations.


I remember reading about the first cable that was laid across the Atlantic.

https://en.wikipedia.org/wiki/Transatlantic_telegraph_cable

They didn't know much about transmission line theory, and even burned out the cable at one point. Heaviside was only about 8 years old when the first cable was laid down.


> Controlled impedance took me a long time to wrap my head around (...) When you flip on a switch, to turn on anything, send data, morse something etc, how does the circuit know how much current the load at the other end needs?

This kind of thing has always been my hurdle when learning electronics. Whether through tutorials or high-school level physics, all the explanations I read tend to simplify and omit things in exactly the way as to break down when you start asking questions like this. My pet peeve are various equations that people flip around seemingly arbitrarily to calculate just the thing they need at the spot, from the two "knowns" that happen to be unknown at the same spot a moment later. Everything is obviously affecting everything else, but no one thought to mention feedback loops and how to correctly deal with them (even if by simplifying them away).

Or maybe I'm just a naturally imperative thinker, and I don't feel comfortable with declarative explanations which I can't "step through" mentally to understand the underlying process. Which, in case of electronics, involves voltages propagating around the circuit at finite speeds.

In programming, this hit me wrt. non-deterministic programming in Prolog. Usually explained as magic. "You can assume program will compute X, because it's structured so that if it wouldn't, it would hit this 'can never happen' statement, and because that - literally - can never happen, it magically must take the correct path".

Became immediately obvious to me once I realized that the runtime is just hiding a big fat loop that takes every path for you, and the magic instruction just tells it to silently discard the current path and try another one. Overall, the moment I felt I finally understand Prolog was when I realized the runtime is doing depth-first search in the background.


That is a great visualisation, yes!

It's probably more helpful to not use the idea of 'guessing' though - the transition from off to on is a signal with a certain frequency content and that signal is reacting to the capacitance and inductance etc. of the wires and propagating through them basically the only way it can. Once the signal has propagated through and the ringing has all been absorbed it settles on the DC condition.


That video (and channel, seemingly) is incredible, thank you for posting! I've never seen anything like the visualizations starting at ~10m.


Wow, that is indeed quite strange - that’s a very high speed for having scraped along a 9000 foot runway.

They either landed extremely long or it rhymes a bit with the Pakistani Airlines accident of an attempted gear-up landing go around a couple of years back, both not implausible in the context of already dealing with a bird strike. There are also edge cases where the plane won’t yell Landing Gear at you, and it‘s really really hard to get a 737 to a point where you can’t lower the gear anymore (multiple hydraulic systems failing, gravity pins and pulleys as well, Stig Aviation did a great video on that.)

Pretty sure there was no EMAS, as the plane dips down into the dirt at the end of the runway right away, ie not that much lift, and EMAS would do orders of magnitude more arresting.


I’m leaning toward no EMAS. I found a facility directory that lists the size of the runway-end safety area where EMAS would be but it has no description of any EMAS system.

https://aim.koca.go.kr/eaipPub/Package/2022-09-07-AIRAC/html...

I’m not sure EMAS would have helped though. I believe EMAS relies on the weight of the aircraft bearing on the relatively small contact area of the tires/wheels to punch thru the unreinforced concrete. The weight of the aircraft distributed across the area of the belly may not be sufficient to break through the surface.


Plane was likely going too fast for EMAS to be effective.


And apparently EMAS require gears down to be effective.


That’s what got me excited about the movie in the first place - but leaving the theatre I felt like Graeber would be spinning in his grave.

A Randian California Ideology hero is the exact opposite of what he spent his life arguing and fighting for.

And it reminded me of the Adam Curtis line: “we now all live in the mind of a dying hippie”. It seems like Coppola fits that bill perfectly.


I agree he appears to misunderstand Graeber/Wengrow, given his remarks on Instagram, but I still find his enthusiasm for their work is an important angle that is completely absent from the discourse. I'm also not certain the architect portrayal is meant to be idealistic or unproblematized


Steven Colbert and Anderson Cooper once had a wonderful, deeply human conversation about grief and loss, which also touches briefly upon Colbert’s mom losing two sons and her husband in a plane crash:

https://www.youtube.com/watch?v=YB46h1koicQ

May you find similar comfort, one day.


~ weekly, globally.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: