Show HN: Natural Language Processing Demystified

brooksbp · on Dec 1, 2022

Thank you for sharing this! I am currently studying NLP..

Along the way, I've been struggling with a question and I hope someone can help me understand how to go about this: how would you build a model that does more than one NLP task? For a simple classifier like input: text (a tweet) and output: text (an emotion), you can fine-tune an existing classifier on such a data set. But, how would you build a model that does NER and sentiment analysis? E.g. input: text (a Yelp review of a restaurant) and output: list of (entity, sentiment) tuples (e.g. [("tacos", "good"), ("margaritas", "good"), ("salsa", "bad")]). If you have a data set structured this way, and want to fine-tune a model, how does that model know how to make use of a Python list of tuples?

axiom92 · on Dec 1, 2022

If you have the dataset, you can try to train a model like T5 [1], notebook [2].

You just need to create [(input, output)] examples in the format you want.

For example

[(a Yelp review of a restaurant, [("tacos", "good"), ("margaritas", "good"), ("salsa", "bad")]].

With enough data, the model should be able to learn to generate the output in the right format.

> Python list of tuples

Things get interesting if you want to generate actual Python code. You can use a large language model with just a few examples of the task to generate such code. For example, see https://reasonwithpal.com/.

Happy to answer more questions!

[1] https://huggingface.co/docs/transformers/model_doc/t5

[2] https://colab.research.google.com/github/huggingface/noteboo...

mothcamp · on Dec 1, 2022

You could start by looking into either multitask transformers or really general seq2seq models like T5. With T5, for example, it just learns to transform one text sequence into another. So you could fine-tune T5 to produce your target sequence, but rather than outputting an explicit Python list of tuples, it would output a string that looks like a sequence of tuples.

Or maybe skip all that and outsource it to GPT: https://imgur.com/a/BQv6C3K

brooksbp · on Dec 1, 2022

Ah, so if the model is just converting input text into output text, it can really learn how to do just about anything? But, there may be certain aspects of model design that make it better at some types of conversions ("tasks") than others? And there may be certain data sets that you want to train a base model on to get base learning of such as general language comprehension, and then build on top of that for your specific use case?

mothcamp · on Dec 1, 2022

Yeah, I can see that being the case for specialized domains. With state-of-the-art models widely available to the public, knowledge of the domain and its workflows, and fine-tuning models to suit the domain will probably be your edge.

trenchgun · on Dec 1, 2022

It is kind of like a very opaque but trainable Turing machine.

gattilorenz · on Dec 1, 2022

Yours is an example of aspect-based sentiment analysis. Typically it has been tackled in two steps: first extract the aspects, then classify them as positive/negative. GPT or T5 are possible options for doing both in one go, but splitting the task seems to be still a good option [1].

[1] http://essay.utwente.nl/91778/1/Middelraad_BA_EEMCS.pdf

reichardt · on Dec 1, 2022

I love your course for being very comprehensive and technical while not getting lost in mundane details. Like the opposite of the following quote:

“I didn't have time to write a short letter, so I wrote a long one instead.” [1]

[1] https://www.goodreads.com/quotes/21422-i-didn-t-have-time-to...

mharig · on Dec 1, 2022

"Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus courte."

Blaise Pascal, 1656

FYI

reichardt · on Dec 1, 2022

Ah, thanks! I didn’t know that and shouldn’t have used the first result that came up with a google search.

mothcamp · on Dec 1, 2022

Really appreciate that. Finding that balance was one of the hardest parts of building this course.

reichardt · on Dec 1, 2022

Yes, it's easy to see you put a lot of thought into that. I hope your course receives much more exposure. When I first found your videos a few weeks ago, I was surprised how few views they have given to the quality of the course.

Do you record the voice track of your videos yourself?

Glad to see you published the final lesson about transformers. Was looking forward to that!

mothcamp · on Dec 1, 2022

I did record all voice tracks, yeah. If I do this again, I'll probably use a lot of generative tools now. :-D

Hope you find the transformers module useful!

reichardt · on Dec 2, 2022

That's impressive. The audio track of your videos is so clean and well understandable that I was wondering if you used a studio setup or voice synthesis software. Well done!

account-5 · on Dec 1, 2022

I like that your site runs properly with only first party scripts enabled in ublock, very rare these days.

Secondly kudos for not requiring a sign up and for making it free!

Looks like a brilliant resource, thank you.

culanuchachamim · on Dec 1, 2022

Link: https://www.nlpdemystified.org/

benjismith · on Dec 2, 2022

This is awesome! I just finished watching the Unit 10 video ("Neural Networks I") and it filled in quite a few gaps in my understanding.

I really love that you build a complete working example, all the way down to the matrix multiplications, so that we can see how everything works, at every layer of abstraction.

I'm looking forward to the next unit, and I can already tell this is going to be an indispensable reference I'll come back to review again and again.

Thank you!!

santiagobasulto · on Dec 1, 2022

Great content! And thank you for making it open and free. I recommend adding a License to your Github repo.

fuzzythinker · on Dec 1, 2022

Part 1 thread 6 mos ago: https://news.ycombinator.com/item?id=31421232

escanor · on Dec 2, 2022

great work! just a note regarding tf-idf, when you mention log10: i think you're missing the point on the reason of log and most importantly base 10. namely, using log10 gives us a perspective on the number of digits of the term/document frequency. if a term "A" occurs 23 times and a term "B" occurs 50, they will have a very close representation (because both numbers are 2 digits ones).

anyway, thanks for the submission

posharma · on Dec 1, 2022

Who is the intended audience for this course? Is it application developers looking to use NLP in their apps? Or machine/deep learning devs?

mothcamp · on Dec 1, 2022

It's for anyone who wants to learn NLP such that they get (a) an understanding of what's going on under the hood and (b) knowledge of how to get stuff done.

So the ideal outcome is someone who gets an end-to-end view from theory/concept to implementation.

If someone just wants to learn how to use tools/frameworks, I'd stick to the Colab notebooks. If someone's already experienced in ML and wants to learn something NLP-specific, I'd skip around to see what's interesting.

posharma · on Dec 1, 2022

Thanks. Excellent course.

phodo · on Dec 2, 2022

Thank you for this. Did you use any tool to publish the site and UI, or all static and vanilla? It's snappy, nice and clean.

mothcamp · on Dec 2, 2022

Thanks. It's all statically-generated pages with NextJS and Tailwind.

yannis · on Dec 1, 2022

Thanks excellent course, watched about an hour or two. Very well made. Deserves more exposure.

strumyktomira · on Dec 1, 2022

Oh! That interesting me a lot :) I wanted to learn it in next months. Thank You very much! :)

jumasheff · on Dec 1, 2022

OMG! Can't thank you enough!

insane_dreamer · on Dec 1, 2022

10/10 for making this freely available

toolslive · on Dec 1, 2022

off topic: When did the default semantics for "NLP" change from Nonlinear programming [0] to Natural Language Processing?

[0] https://en.wikipedia.org/wiki/Nonlinear_programming

echobear · on Dec 1, 2022

thank you!! as a CS student interested in ML I will 100% be taking a look at this when I get some free time

clueless · on Dec 1, 2022

thank you. Free ytube videos, with link to google colabs, , this is incredible...

yupis · on Dec 1, 2022

I wish there where written notes to study. Anyways great video.

wazoox · on Dec 1, 2022

This looks awesome. No signup, that's a dream :)

ddtaylor · on Dec 1, 2022

What is the cost?

mothcamp · on Dec 1, 2022

Your time. That's it.

carvking · on Dec 1, 2022

[flagged]

dang · on Dec 1, 2022

Could you please stop posting unsubstantive and/or flamebait comments? We have to ban accounts that do this. It's not what this site is for, and destroys what it is for.

https://news.ycombinator.com/newsguidelines.html