Oxford Deep NLP – An advanced course on natural language processing

jkbschwarz · on Feb 7, 2017

Im taking this course at Oxford and they have been working through the practicals 1-3 (further ones will be posted)

For anyone considering working through this outside of Oxford: I think the practicals are the real gems here and should be doable without the practical lab sessions that you get when attending the course. With that being said, they use a dataset a bit closer to a real world assignment. Therefore, it requires some patience when wrangling the data especially for the later practicals.

However, the patience should pay off and it is rewarding once you build your own nonsense spewing TEDbot!

snissn · on Feb 7, 2017

Would you be able to post your own solutions for the non sense spewing ted bot? I'd like to jump into hacking on something that already works while also working through the theory from the lectures

jkbschwarz · on Feb 7, 2017

Yes happy to share once the evaluation period is over, but sadly I dont think Im allowed to share beforehand. Will have to check when but I expect it should be in approximately 2 weeks.

In the meantime, you might want to check out this excellent blog post http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.h.... This will provide you with skeleton code to implement a character level generative model (similar to Practical 3, Task 2 with the exception that there you will generate words and not characters). Andrej Karpathy's blog post on LSTMs is also excellent and I believe he also provides the code in his repository as well http://karpathy.github.io/2015/05/21/rnn-effectiveness/

axplusb · on Feb 7, 2017

Regarding practicals: is there anything more than the overview slides ?

mcintyre1994 · on Feb 7, 2017

The org : https://github.com/oxford-cs-deepnlp-2017

That contains repos for practicals, eg. Practical 1: https://github.com/oxford-cs-deepnlp-2017/practical1

jkbschwarz · on Feb 7, 2017

yes if you go to the root https://github.com/oxford-cs-deepnlp-2017/ you will find the practicals 1-3. Practical 1 has a jupyter notebook skeleton. Practicals 2 & 3 are simply questions guiding your implementation, but you are supposed to write it yourself from scratch.

demonshalo · on Feb 7, 2017

here is what I don't understand about deep NLP (please keep in mind that I just began exploring this field):

I am currently working on an algorithm that uses elementary text cues in combination with large data-table lookups to determine things like relevant keywords of news articles scraped from various sites. I have given my results to hundreds of people independently to provide me with some feedback regarding the quality. Here is the current breakdown:

80% of the cases I get perfect score.

10% of the cases I get acceptable score.

10% of the cases needs improvement.

My questions here are:

1. if deep nlp can only provide us with the same level of efficiency/accuracy, then why the hell would we use it?

2. if deep nlp can provide us with more efficiency than what is stated above then wouldn't it be safe to assume that is UNREASONABLY efficient?

3. why are most people using deep nlp or ML in general right off the bat. Theoretically, it would be far more interesting to construct a model where the result of a statistical/linguistically parsing is fed to some sot of ML algo in order to tackle that 10% of bad cases.

jventura · on Feb 7, 2017

I've worked in the NLP research area (mostly with statistical metrics), and I can safely say that 80% (precision) is the empirical threshold that most metrics are able to reach quite easily. The threshold between 80% and 90% starts to get difficult and above that you have to do some tweaking to adapt to the specifics of your problem.

With that said, your values do seem to be in line with what I consider to be easily reachable, so it kind-of depends on how much work you need to do with the neural networks to extract those keywords. I'm not very knowledgeable on how NN are applied to this field, but I'm assuming that a drawback of that approach is that it may resemble a black-box in the sense that it may be hard to tweak the internals.

I prefer statistical metrics because they seem more simple to derive. For instance, you can think of things like "a relevant keyword is usually related with (or closer to) other relevant keywords" and you can test that hypothesis only by counting distances between words. This is what I've done in 2012 with quite good values, you can check the paper here: http://www.sciencedirect.com/science/article/pii/S1877050912...

demonshalo · on Feb 7, 2017

> I prefer statistical metrics because they seem more simple to derive.

That's exactly how I view it as well. My goal for this project is to reach a 90% "perfect" score. And in that case, ML seems to not even be needed. Perhaps the gap between 90 and 95-100% is where ML can help add value. But that in itself is what #3 is about in my original post.

Thank you for confirming my suspicions regarding the threshold though!

jamra · on Feb 7, 2017

I believe that NN provide the logical relations for you so perhaps they can be used to train and then inspected to understand the data better.

jventura · on Feb 8, 2017

As I said, don't know much about deep NN, but what I can remember of neural networks from my college years, each "node" on the network only have weights associated with the inputs, which makes it a blackbox. In other words, it is not easy to "grab" a node from the network, check the weights associated with the inputs and understand how it relates to a language..

rcpt · on Feb 7, 2017

You don't have to now. But what if you were to change the problem a bit, you'd need to reinvent those "elementary text cues", right? With deep learning (or, more generally, representation learning) you can simply change the training data and reuse the rest of your algorithm. Jure Leskovec has a paper, node2vec, which describes this well:

> A typical solution in- volves hand-engineering domain-specific features based on expert knowledge. Even if one discounts the tedious effort required for feature engineering, such features are usually designed for specific tasks and do not generalize across different prediction tasks. An alternative approach is to learn feature representations by solving an optimization problem [4]. The challenge in feature learn- ing is defining an objective function, which involves a trade-off in balancing computational efficiency and predictive accura

mcguire · on Feb 8, 2017

What is the difference between feature engineering and defining an objective function?

rcpt · on Feb 8, 2017

It's the same kinda game in some way. Objective functions are often easily reused though.

PeterisP · on Feb 8, 2017

Can you elaborate why would you consider "more efficiency than what is stated above then wouldn't it be safe to assume that is UNREASONABLY efficient" ?

Certainly in many cases a better accuracy can be both reasonably needed and reasonably possible (i.e. if humans can do it, then it's obviously possible).

One measure that is used, and is a bit similar (though with a major difference) is "inter-annotator agreement", i.e., you ask the same question to multiple people and note how often they agree. That would be considered a reasonable ceiling, is a measure of how objective/subjective the question is, a measure of how often there really is a single "correct answer"; for some problems that metric is near 100% and can be reasonably beaten by a good system, because the mismatches are caused by human mistakes instead of true disagreements; for others (e.g. some forms of emotion/sentiment/sarcasm analysis) 80% is unreasonably good, since the text doesn't have enough information to decide for sure.

Also, an answer to (3) is that to get a state of art result (as opposed to a simple baseline) with non-DNN methods you need a quite complex system and lots of custom feature engineering. If you have (or get) one, that's not an issue, but if developing a system from scratch, a good DNN system needs less labor than a good "classic" system. For example, a major point in neural machine translation is that it not only gets better results, but that it can get them with a much simpler NLP pipeline. When a "classic" system needs to integrate 10-30 additional separate modules (ML or with manually crafted rules) for handling various types of special cases or feature analysis, much of that (though not all) can be learned by a deep neural network directly in end to end training; so if you go directly to DNN then you avoid the (huge!) work of implementing them manually.

ma2rten · on Feb 7, 2017

When you are doing Machine Learning you should always have a simple baseline and only use more complicated algorithms when they improve over your baseline.

demonshalo · on Feb 7, 2017

That's my #3 question in a nutshell. And this seems to be a rather good strategy imo. However, the way everyone is talking about ML makes it seem as a diamond bullet to solve all that is holy and sacred!

traek · on Feb 7, 2017

> 2. if deep nlp can provide us with more efficiency than what is stated above then wouldn't it be safe to assume that is UNREASONABLY efficient?

Why? Neural nets can already detect skin cancer as well as human dermatologists [1]. Why would you assume that your algorithm is the peak of efficiency and anything that performs better is "unreasonable"?

[1] https://news.ycombinator.com/item?id=13484372

ska · on Feb 7, 2017

   Neural nets can already detect skin cancer as well as human dermatologists

For what it is worth, that statement is way too strong for what the linked article and paper show.

demonshalo · on Feb 7, 2017

I didn't say that I assume mine to be the peak. My intention was to point out that any efficiency that could be reached beyond what is statistically possible (i.e. if you only rely on statistical metrics and parsing), could be considered unreasonably efficient. At least, there is an argument for that to be made.

My method should in no way, shape or form be considered as a "peak".

ousta · on Feb 8, 2017

regarding 3. that is what we do. basically we re an NLP company that evolved into something different and we leverage the NLP parsing to do ML on enriched data

seycombi · on Feb 7, 2017

YOUTUBE-DL will download the lectures https://rg3.github.io/youtube-dl/

jdemler · on Feb 7, 2017

Thank you. I just wanted to complain about the need for the adobe flash player.

hmate9 · on Feb 7, 2017

I am currently taking this course at Oxford and definitely recommend following this.

We will be using TED talks as our dataset, to create Question Answering, text completion, generating entire TED talks ourselves etc. Definitely very interesting and it is being taught by leading researchers in the field!

roystonvassey · on Feb 7, 2017

Began this course earlier today and I think they appear to be pulling off the right combo of first principles foundation and tough problem sets, like cs224n (Karpathys CNN class). Other NLP courses that I've taken so far have gone over my head.

orthoganol · on Feb 7, 2017

> The primary assessment for this course will be a take-home assignment issued at the end of the term. This assignment will ask questions drawing on the concepts and models discussed in the course, as well as from selected research publications.

Comes as a surprise that it's not a project, as, in my experience, all ML/ DL courses I've seen online from US universities (Cal, Stanford, etc.) require. Different university culture across the pond?

NPDW · on Feb 7, 2017

Oxbridge is very much exam based assessment. With your degree depending only on the exams taken at the end of the final year. So the take-home assessments are there just to make sure that you understand what you're doing and can progress to the next part of the course.

arethuza · on Feb 7, 2017

I must say that I am rather glad the degree I did at a Scottish university was based mainly of 4th year exams, with an element of 3rd year exams and coursework.

The idea of a averaging all of your work across your course would have been a disaster for me as I did rather poorly in year 1, scraped through year 2 and did spectacularly well in years 3 and 4.

tonyedgecombe · on Feb 7, 2017

I'm not sure about Oxford but certainly for Cambridge you aren't graded overall, instead you get a grade for each part of the tripos[1] (two parts).

https://en.wikipedia.org/wiki/Tripos

NPDW · on Feb 7, 2017

Yes but you generally treat your degree as being graded by whatever you got in part II

ianopolous · on Feb 7, 2017

In Oxford, at least for Physics, your grades from every year (except possibly the first) count, but they are heavily weighted towards later years.

signa11 · on Feb 7, 2017

> Comes as a surprise that it's not a project, as, in my experience, all ML/ DL courses I've seen online from US universities (Cal, Stanford, etc.) require. Different university culture across the pond?

from that same paragraph:

| The pratical (sic) component of the course will be assessed in the usual way.

have look at 'Lecture 2b - Overview of the Practicals.pdf' to be convincingly (imho) dissuaded of the fact that it was 'easy-peasy' :)

melqdusy · on Feb 7, 2017

Stanford's version https://web.stanford.edu/class/cs224n/ Note: the videos will be available later.

wlrd · on Feb 7, 2017

Any idea when the lectures will be up?

julien_c · on Feb 7, 2017

Last spring's session's videos (so 9 months ago) are up on Youtube: https://www.youtube.com/playlist?list=PLmImxx8Char9Ig0ZHSyTq...

Fede_V · on Feb 7, 2017

If you want more advanced materials, both Kyunghyun Cho and Yoav Goldberg posted excellent notes: https://arxiv.org/abs/1511.07916 and https://arxiv.org/abs/1510.00726

kalal · on Feb 7, 2017

'Advanced' course with 'Sesame Street' introduction: https://ox.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=b7d... One of them should go :)

mailshanx · on Feb 7, 2017

How does the course content and rigor compare to the Stanford deep learning for NLP course? From a cursory glance at the practicals, it seems like the Stanford version has more variety and depth of problems.

roystonvassey · on Feb 7, 2017

If you're well-versed in DL + nuances of text-mining, then the Stanford one might be great. But, if you're a beginner like me, you'll find it a bit hard to catch up [1]. The Stanford assignments + Tensorflow class were very helpful. The assignments especially really get you thinking.

[1] I took the Stanford one and went through the videos (both 2015,2016).

webmaven · on Feb 7, 2017

Link should probably be to the org rather than a specific repo: https://github.com/oxford-cs-deepnlp-2017

RegW · on Feb 7, 2017

"Prerequisites: This not meant to be an introduction to Machine Learning course. Hopefully you've all got some knowledge to machine learning, otherwise you may find this a bit opaque. So at least you should understand/have taken courses in linear algebra, calculus, probability, ... we are not going to do anything particularly challenging in those areas, but ideas from those areas will be useful."

around 7 mins 30secs into the introduction

cr0sh · on Feb 7, 2017

Those sound like standard pre-reqs for any ML course - it should probably be a standard disclaimer on every course except for intros (even there, though, you'll want the linear algebra and probability knowledge - at least the bare basics). I can only imagine the number of people who jump into such courses and get in over their head very quickly...

alfonsodev · on Feb 7, 2017

Here [1] is an example of the videos, the player has a handy search feature and links to video parts.

update: It would be great to have a way to take your own notes, any chrome extension that can help with that ?

[1] https://ox.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=ff9...

lk251 · on Feb 7, 2017

Where could I find a list of links to the videos?

alfonsodev · on Feb 7, 2017

On the README https://github.com/oxford-cs-deepnlp-2017/lectures/blob/mast...

lk251 · on Feb 7, 2017

Thanks!

option_greek · on Feb 7, 2017

What are the practical uses of language modelling RNNs ? (apart from writing grammar/syntax checkers)

spangry · on Feb 7, 2017

I wonder if they could be used to improve speech recognition accuracy. So you'd have two models running when someone utters a sentence: the first would generate the x most likely phrases that it thinks were spoken, and the second (the RNN) would select the highest ranked 'plausible' sentence (i.e. a sentence it would have been able to generate itself).

I guess that's a bit indirect, but these RNNs are essentially learning the 'rules' that actual phrases conform to. It'd definitely be better than trying to hard-code the rules (especially for a language like English!). And the training data is very easy to get: just feed it a few thousand ebooks, the comments section from HN etc.

bshillingford · on Feb 7, 2017

Standard neural network-based speech recognition pipelines (i.e. RNN + CTC) always use a language model. Unlike a seq2seq model (or any autoregressive model, or a structured prediction output), CTC models output timesteps as conditionally independent. Hence, everyone uses an RNN LM or n-gram LM or both when retrieving probable sequences from a CTC model (e.g. with beam search).

rkjaran · on Feb 7, 2017

Language models are a standard component of speech recognition. Although N-gram LMs are used much more often than RNN LMs.

circuithunter · on Feb 7, 2017

Machine translation is a big one. See the recent NY Times feature [1] and the arxiv paper [2]. Automated image captioning is another (used extensively by Facebook).

[1] https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awa...

[2] https://arxiv.org/abs/1609.08144

visarga · on Feb 7, 2017

Language encoding and generation (sequence to sequence).

option_greek · on Feb 7, 2017

Could you expand a bit on that. I have come across examples generating poems from Shakespeare works and realistic looking css/javascript but still trying to find out a more realistic usecase.

matt4711 · on Feb 7, 2017

Say you want to translate a sentence from language A to language B. You have a system that generates 10 possible translations in language B. Now you use a language model of language B to figure out which is the best translation by asking the language model which of the translations has the highest probability of being generated by the model.

greeneggs · on Feb 7, 2017

Is there any way to "translate" between writing styles of the same language? I'm thinking something analogous to the Van Gogh-ify image-processing techniques using deep convolutional networks.

Even very simple transformations, e.g., adding alliteration/assonance or adding rhymes everywhere, might be fun.

stephancoral · on Feb 7, 2017

Look into Google's SyntaxNet - https://research.googleblog.com/2016/05/announcing-syntaxnet...

Should help you break down sentences into their semantic parts. The transformations are then made by walking the syntax tree and modifying the tagged parts of speech as you see fit.

sprobertson · on Feb 7, 2017

Chat bots, image to caption, and question answering are some more realistic use cases of the generator side. In those cases there is some input (previous chat message, an image, a question) which is encoded into a vector sometimes referred to as a context or thought vector. The decoder/generator unfolds that vector into a series of words.

ejanus · on Feb 8, 2017

Is there any good introductory material? I have tried severally to understand the theory and the spirit of DL and ML, but I have not been able to connect the dots. Please direct my path.

imakecomments · on Feb 8, 2017

Calculus, Linear algebra, Statistics, Probability theory. You won't get far studying "basic" ML without an elementary understanding of those subjects. Unless you blackbox the implementations and hack them, but you'll have trouble understanding why things work/when to use certain techniques vs others or how to tune for optimizations.

pratap103 · on Feb 7, 2017

I'm reading 'Deep Learning' right now so this is going to be really useful. Thanks a lot!

snowcrshd · on Feb 10, 2017

Who's the author? I'd like to learn more about deep learning techniques, so I appreciate any suggestions on introductory materials!

alexpw · on Feb 13, 2017

The book 'Deep Learning' generally refers to deeplearningbook.org, though there are others by that name. Goodfellow, et al, I think of as the canonical.

hoju · on Feb 8, 2017

Darn - I did an MSc at Oxford last year and they didn't offer this course then

mcintyre1994 · on Feb 7, 2017

This looks amazing, thankyou for sharing!

mrcactu5 · on Feb 7, 2017

coookie monster and the fairy keep exchanging apples <----> bananas

how does this help me solve NLP

jray · on Feb 7, 2017

Edit.