Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is actually pretty straightforward why those model "reason" or, to be more exact, can operate on a complex concepts. By processing huge amount of texts they build an internal representation where those concepts are represented as a simple nodes (neurons or groups). So they really distill knowledge. Alternatively you can think about it as a very good principal component analysis that can extract many important aspects. Or like a semantic graph built automatically.

Once knowledge is distilled you can build on top of it easily by merging concepts for example.

So no secret here.



Do they distill knowledge or distill the relationship between words (that describe knowledge)

I know it seems dancing on head of pin but …


Well the internal representation is tokens not words so.. the pin is even smaller?

They distill relationships between tokens. Multiple tokens together make up a word, and multiple words together make up a label for something we recognize as a "concept".

These "concepts" are not just a label though - they are an area in the latent space inside the neural network which happens to contains those words in the sequence (along with other labels that mean similar things).

A simple demonstration of this is how easily multi-modal neural networks build cross modal representations of the same thing, so "cats" end up in the same place in both image and word form but also more complex concepts ("a beautiful country fields with a foreboding thunderstorm forming") will also align well between the words and the images.


> Do they distill knowledge or distill the relationship between words (that describe knowledge)

Do we know that there's a difference between the two? Maybe this distinction is just a god of the gaps.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: