Im taking this course at Oxford and they have been working through the practicals 1-3 (further ones will be posted)
For anyone considering working through this outside of Oxford: I think the practicals are the real gems here and should be doable without the practical lab sessions that you get when attending the course. With that being said, they use a dataset a bit closer to a real world assignment. Therefore, it requires some patience when wrangling the data especially for the later practicals.
However, the patience should pay off and it is rewarding once you build your own nonsense spewing TEDbot!
Would you be able to post your own solutions for the non sense spewing ted bot? I'd like to jump into hacking on something that already works while also working through the theory from the lectures
Yes happy to share once the evaluation period is over, but sadly I dont think Im allowed to share beforehand. Will have to check when but I expect it should be in approximately 2 weeks.
In the meantime, you might want to check out this excellent blog post http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.h.... This will provide you with skeleton code to implement a character level generative model (similar to Practical 3, Task 2 with the exception that there you will generate words and not characters). Andrej Karpathy's blog post on LSTMs is also excellent and I believe he also provides the code in his repository as well http://karpathy.github.io/2015/05/21/rnn-effectiveness/
yes if you go to the root https://github.com/oxford-cs-deepnlp-2017/ you will find the practicals 1-3. Practical 1 has a jupyter notebook skeleton. Practicals 2 & 3 are simply questions guiding your implementation, but you are supposed to write it yourself from scratch.
here is what I don't understand about deep NLP (please keep in mind that I just began exploring this field):
I am currently working on an algorithm that uses elementary text cues in combination with large data-table lookups to determine things like relevant keywords of news articles scraped from various sites. I have given my results to hundreds of people independently to provide me with some feedback regarding the quality. Here is the current breakdown:
80% of the cases I get perfect score.
10% of the cases I get acceptable score.
10% of the cases needs improvement.
My questions here are:
1. if deep nlp can only provide us with the same level of efficiency/accuracy, then why the hell would we use it?
2. if deep nlp can provide us with more efficiency than what is stated above then wouldn't it be safe to assume that is UNREASONABLY efficient?
3. why are most people using deep nlp or ML in general right off the bat. Theoretically, it would be far more interesting to construct a model where the result of a statistical/linguistically parsing is fed to some sot of ML algo in order to tackle that 10% of bad cases.
I've worked in the NLP research area (mostly with statistical metrics), and I can safely say that 80% (precision) is the empirical threshold that most metrics are able to reach quite easily. The threshold between 80% and 90% starts to get difficult and above that you have to do some tweaking to adapt to the specifics of your problem.
With that said, your values do seem to be in line with what I consider to be easily reachable, so it kind-of depends on how much work you need to do with the neural networks to extract those keywords. I'm not very knowledgeable on how NN are applied to this field, but I'm assuming that a drawback of that approach is that it may resemble a black-box in the sense that it may be hard to tweak the internals.
I prefer statistical metrics because they seem more simple to derive. For instance, you can think of things like "a relevant keyword is usually related with (or closer to) other relevant keywords" and you can test that hypothesis only by counting distances between words. This is what I've done in 2012 with quite good values, you can check the paper here: http://www.sciencedirect.com/science/article/pii/S1877050912...
> I prefer statistical metrics because they seem more simple to derive.
That's exactly how I view it as well. My goal for this project is to reach a 90% "perfect" score. And in that case, ML seems to not even be needed. Perhaps the gap between 90 and 95-100% is where ML can help add value. But that in itself is what #3 is about in my original post.
Thank you for confirming my suspicions regarding the threshold though!
As I said, don't know much about deep NN, but what I can remember of neural networks from my college years, each "node" on the network only have weights associated with the inputs, which makes it a blackbox. In other words, it is not easy to "grab" a node from the network, check the weights associated with the inputs and understand how it relates to a language..
You don't have to now. But what if you were to change the problem a bit, you'd need to reinvent those "elementary text cues", right? With deep learning (or, more generally, representation learning) you can simply change the training data and reuse the rest of your algorithm. Jure Leskovec has a paper, node2vec, which describes this well:
> A typical solution in-
volves hand-engineering domain-specific features based on expert
knowledge. Even if one discounts the tedious effort required for
feature engineering, such features are usually designed for specific
tasks and do not generalize across different prediction tasks.
An alternative approach is to learn feature representations by
solving an optimization problem [4]. The challenge in feature learn-
ing is defining an objective function, which involves a trade-off
in balancing computational efficiency and predictive accura
Can you elaborate why would you consider "more efficiency than what is stated above then wouldn't it be safe to assume that is UNREASONABLY efficient" ?
Certainly in many cases a better accuracy can be both reasonably needed and reasonably possible (i.e. if humans can do it, then it's obviously possible).
One measure that is used, and is a bit similar (though with a major difference) is "inter-annotator agreement", i.e., you ask the same question to multiple people and note how often they agree. That would be considered a reasonable ceiling, is a measure of how objective/subjective the question is, a measure of how often there really is a single "correct answer"; for some problems that metric is near 100% and can be reasonably beaten by a good system, because the mismatches are caused by human mistakes instead of true disagreements; for others (e.g. some forms of emotion/sentiment/sarcasm analysis) 80% is unreasonably good, since the text doesn't have enough information to decide for sure.
Also, an answer to (3) is that to get a state of art result (as opposed to a simple baseline) with non-DNN methods you need a quite complex system and lots of custom feature engineering. If you have (or get) one, that's not an issue, but if developing a system from scratch, a good DNN system needs less labor than a good "classic" system. For example, a major point in neural machine translation is that it not only gets better results, but that it can get them with a much simpler NLP pipeline. When a "classic" system needs to integrate 10-30 additional separate modules (ML or with manually crafted rules) for handling various types of special cases or feature analysis, much of that (though not all) can be learned by a deep neural network directly in end to end training; so if you go directly to DNN then you avoid the (huge!) work of implementing them manually.
When you are doing Machine Learning you should always have a simple baseline and only use more complicated algorithms when they improve over your baseline.
That's my #3 question in a nutshell. And this seems to be a rather good strategy imo. However, the way everyone is talking about ML makes it seem as a diamond bullet to solve all that is holy and sacred!
> 2. if deep nlp can provide us with more efficiency than what is stated above then wouldn't it be safe to assume that is UNREASONABLY efficient?
Why? Neural nets can already detect skin cancer as well as human dermatologists [1]. Why would you assume that your algorithm is the peak of efficiency and anything that performs better is "unreasonable"?
I didn't say that I assume mine to be the peak. My intention was to point out that any efficiency that could be reached beyond what is statistically possible (i.e. if you only rely on statistical metrics and parsing), could be considered unreasonably efficient. At least, there is an argument for that to be made.
My method should in no way, shape or form be considered as a "peak".
regarding 3. that is what we do. basically we re an NLP company that evolved into something different and we leverage the NLP parsing to do ML on enriched data
I am currently taking this course at Oxford and definitely recommend following this.
We will be using TED talks as our dataset, to create Question Answering, text completion, generating entire TED talks ourselves etc. Definitely very interesting and it is being taught by leading researchers in the field!
Began this course earlier today and I think they appear to be pulling off the right combo of first principles foundation and tough problem sets, like cs224n (Karpathys CNN class). Other NLP courses that I've taken so far have gone over my head.
> The primary assessment for this course will be a take-home assignment issued at the end of the term. This assignment will ask questions drawing on the concepts and models discussed in the course, as well as from selected research publications.
Comes as a surprise that it's not a project, as, in my experience, all ML/ DL courses I've seen online from US universities (Cal, Stanford, etc.) require. Different university culture across the pond?
Oxbridge is very much exam based assessment. With your degree depending only on the exams taken at the end of the final year. So the take-home assessments are there just to make sure that you understand what you're doing and can progress to the next part of the course.
I must say that I am rather glad the degree I did at a Scottish university was based mainly of 4th year exams, with an element of 3rd year exams and coursework.
The idea of a averaging all of your work across your course would have been a disaster for me as I did rather poorly in year 1, scraped through year 2 and did spectacularly well in years 3 and 4.
> Comes as a surprise that it's not a project, as, in my experience, all ML/ DL courses I've seen online from US universities (Cal, Stanford, etc.) require. Different university culture across the pond?
from that same paragraph:
| The pratical (sic) component of the course will be assessed in the usual way.
have look at 'Lecture 2b - Overview of the Practicals.pdf' to be convincingly (imho) dissuaded of the fact that it was 'easy-peasy' :)
How does the course content and rigor compare to the Stanford deep learning for NLP course? From a cursory glance at the practicals, it seems like the Stanford version has more variety and depth of problems.
If you're well-versed in DL + nuances of text-mining, then the Stanford one might be great. But, if you're a beginner like me, you'll find it a bit hard to catch up [1]. The Stanford assignments + Tensorflow class were very helpful. The assignments especially really get you thinking.
[1] I took the Stanford one and went through the videos (both 2015,2016).
"Prerequisites: This not meant to be an introduction to Machine Learning course. Hopefully you've all got some knowledge to machine learning, otherwise you may find this a bit opaque. So at least you should understand/have taken courses in linear algebra, calculus, probability, ... we are not going to do anything particularly challenging in those areas, but ideas from those areas will be useful."
Those sound like standard pre-reqs for any ML course - it should probably be a standard disclaimer on every course except for intros (even there, though, you'll want the linear algebra and probability knowledge - at least the bare basics). I can only imagine the number of people who jump into such courses and get in over their head very quickly...
I wonder if they could be used to improve speech recognition accuracy. So you'd have two models running when someone utters a sentence: the first would generate the x most likely phrases that it thinks were spoken, and the second (the RNN) would select the highest ranked 'plausible' sentence (i.e. a sentence it would have been able to generate itself).
I guess that's a bit indirect, but these RNNs are essentially learning the 'rules' that actual phrases conform to. It'd definitely be better than trying to hard-code the rules (especially for a language like English!). And the training data is very easy to get: just feed it a few thousand ebooks, the comments section from HN etc.
Standard neural network-based speech recognition pipelines (i.e. RNN + CTC) always use a language model. Unlike a seq2seq model (or any autoregressive model, or a structured prediction output), CTC models output timesteps as conditionally independent. Hence, everyone uses an RNN LM or n-gram LM or both when retrieving probable sequences from a CTC model (e.g. with beam search).
Machine translation is a big one. See the recent NY Times feature [1] and the arxiv paper [2]. Automated image captioning is another (used extensively by Facebook).
Could you expand a bit on that. I have come across examples generating poems from Shakespeare works and realistic looking css/javascript but still trying to find out a more realistic usecase.
Say you want to translate a sentence from language A to language B. You have a system that generates 10 possible translations in language B. Now you use a language model of language B to figure out which is the best translation by asking the language model which of the translations has the highest probability of being generated by the model.
Is there any way to "translate" between writing styles of the same language? I'm thinking something analogous to the Van Gogh-ify image-processing techniques using deep convolutional networks.
Even very simple transformations, e.g., adding alliteration/assonance or adding rhymes everywhere, might be fun.
Should help you break down sentences into their semantic parts. The transformations are then made by walking the syntax tree and modifying the tagged parts of speech as you see fit.
Chat bots, image to caption, and question answering are some more realistic use cases of the generator side. In those cases there is some input (previous chat message, an image, a question) which is encoded into a vector sometimes referred to as a context or thought vector. The decoder/generator unfolds that vector into a series of words.
Is there any good introductory material? I have tried severally to understand the theory and the spirit of DL and ML, but I have not been able to connect the dots. Please direct my path.
Calculus, Linear algebra, Statistics, Probability theory. You won't get far studying "basic" ML without an elementary understanding of those subjects. Unless you blackbox the implementations and hack them, but you'll have trouble understanding why things work/when to use certain techniques vs others or how to tune for optimizations.
The book 'Deep Learning' generally refers to deeplearningbook.org, though there are others by that name. Goodfellow, et al, I think of as the canonical.
For anyone considering working through this outside of Oxford: I think the practicals are the real gems here and should be doable without the practical lab sessions that you get when attending the course. With that being said, they use a dataset a bit closer to a real world assignment. Therefore, it requires some patience when wrangling the data especially for the later practicals.
However, the patience should pay off and it is rewarding once you build your own nonsense spewing TEDbot!