jdspiral's comments

jdspiral · 2025-04-23T13:55:39 1745416539

So I've taken the feedback and realized that I was misleading on the name and title. I'm updating the project accordingly.

https://tokenizer-machine.streamlit.app/

fransjorden · 2025-04-25T18:56:25 1745607385

Don't forget to update the link of the post itself, as that one is broken now

jdspiral · 2025-04-23T02:15:09 1745374509

Thanks! Yes — that’s on the roadmap, along with some other cool visualizations I’m working on. Machine translation is definitely something I want to work on: showing how models align meaning across languages using shared embeddings and attention patterns. I’d love to make that interactive too.

sherdil2022 · 2025-04-23T02:33:09 1745375589

I would love to get involved with that (I speak a handful of himan languages). Let me know if you are looking for collaborators.

jdspiral · 2025-04-23T02:04:21 1745373861

Yes, tokenization and embeddings are exactly how LLMs process input—they break text into tokens and map them to vectors. POS tags and SVOs aren't part of the model pipeline but help visualize structures the models learn implicitly.

jdspiral · 2025-04-22T22:55:09 1745362509

I built a tool called Meaning Machine to let you see how language models "read" your words.

It walks through the core stages — tokenization, POS tagging, dependency parsing, embeddings — and visualizes how meaning gets fragmented and simulated along the way.

Built with Streamlit, spaCy, BERT, and Plotly. It’s fast, interactive, and aimed at anyone curious about how LLMs turn your sentence into structured data.

Would love thoughts and feedback from the HN crowd — especially devs, linguists, or anyone working with or thinking about NLP systems.

GitHub: https://github.com/jdspiral/meaning-machine Live Demo: https://meaning-machine.streamlit.app

macleginn · 2025-04-23T08:45:06 1745397906

The presentation is nice! The main point, however, is a bit misleading. From the title, one would assume that we will see something about how LMs do all these things implicitly (as was famously shown for syntax in this paper: https://arxiv.org/pdf/2005.04511, for example), but instead the input is simply given to a bunch of pretrained task-specific models, which may not have much in common and definitely do not have very much in common with what today's LLMs are doing under the hood.

toxik · 2025-04-23T08:52:06 1745398326

You shouldn’t link directly to the pdf, here is the abs page

https://arxiv.org/abs/2005.04511

selfhoster11 · 2025-04-24T10:15:13 1745489713

I''m getting an error message with Streamlit: You do not have access to this app or it does not exist

jdspiral · 2025-04-24T11:45:13 1745495113

I moved the app, it’s now tokenizer-machine.streamlit.app.