Hacker Newsnew | past | comments | ask | show | jobs | submit | gamapuna's commentslogin

Here's a recent interview of Avi Loeb on this subject https://www.youtube.com/watch?v=_DQYiyQ7Tkk


That is probably the best interview/podcast made with Avi Loeb.

Mindscape podcast (Sean Carroll) is really good.


planet earth (both the parts) for me


I use hn.algolia, if there's some topic I am interested in searching there usually fives me the most relevant HN link


This was posted before , but IMO is very well written:- https://www.igvita.com/2014/05/05/minimum-viable-block-chain...



In addition to cs231n which someone recommended, I would also suggest cs224d (stanford) and university of waterloo's course by ali ghodsi STAT 946


check out cs224d and cs231n (both stanford) , there's another course by university of waterloo https://uwaterloo.ca/data-science/deep-learning , you will find lecture videos for all on youtube


Thanks for posting OP, can someone comment on how this compares to mit 6.830 (hopefully someone who has taken both the courses :)) http://db.csail.mit.edu/6.830/sched.html http://db.csail.mit.edu/6.830/notes.html

For folks interested there's another implementation specific db course https://web.stanford.edu/class/cs346/2015/ although the lectures are not on youtube


This (CMU Course) is a graduate research-oriented course, with lots of research papers as required reading.

6.830 is a standard introductory database course (although somewhat more advanced than other introductory database courses like Stanford CS245, or Berkeley CS186).


Slightly off topic but for anyone who is taking this course ...are the materials only related to NLP or are the techniques much more broadly applicable to other areas of deep learning (cursory look of the syllabus suggests this but would be great if someone who is actually taking this course can comment)


I've watched all 8 available videos, which is as far as my knowledge goes but it has been background on gradients, calculating derivatives, introduction to word vectors and how they relate to each other, recurrent neural nets and how to push time series through, introduction to tensor flow and finally how to scan backwards and forwards through "time" in a recurrent RNN (each word in a sentence is a time step in NLP).

Word vectors are "just" high dimensional entities - 100-300 dimensions, used as input. So the introduction to them was about how you go about building a dataset that is a collection of 50,000 column vectors each of which is 300 rows. And then how to use that to go on and build a neural net to do useful work.

The conclusion is that all the work done on syntax, grammar and word classification can effectively be replaced by having a huge corpus (e.g. all of wikipedia is small), 300 dimensions for each word and then a loss function to classify each word.

One can imagine how that would be applied to sales data of multiple products or other data.

It foes on to suggested how sentiment analysis is performed and how entity recognition would work (entities being places, names of people and companies).

The info has been general but described in terms of NLP, the techniques so far are not just for use in NLP.

I'm not an NLP person and tbh I've never even made a neural net (although I could if I had a reason) I'm just interested in the subject.


> The conclusion is that all the work done on syntax, grammar and word classification can effectively be replaced by having a huge corpus

Is that a surprise? You don't teach a child how to speak by telling him about verbs and grammar. He will learn how to use them without having any formal idea about what they are.


Apparently it was a surprise to the AI NLP teams that spent years doing manual classification, suddenly a Deep NN out performed them without any prior knowledge. Just make a 300 dimension vector of the occurrence frequencies of word combinations and out fall the rules of language!


Apparently it was a surprise to the AI NLP teams [...]

Similar techniques were well known and used for years in NLP. E.g. Brown clustering has been used since the early nineties and have been shown to improve certain NLP tasks by quite an amount. NMF also been used for quite some time to obtain distributed representations of words. Also, many of the techniques used in NLP now (word embeddings, deep nets) have been known for quite a while. However, the lack of training data and computational power has prevented these techniques from taking off earlier.

Just make a 300 dimension vector of the occurrence frequencies of word combinations and out fall the rules of language!

The 'rules of language' don't just fall out of word vectors. They fall out of embeddings combined with certain network topologies and supervised training. In my experience (working on dependency parsing), you also typically get better results by encoding language-specific knowledge. E.g. if your language is morphologically rich or does a lot of compounding, the coverage of word vectors is going to be pretty bad (compared to e.g. English). You will have to think about morphology and compounds as well. One of our papers that was recently accepted at ACL describes a substantial improvement in parsing German when incorporating/learning explicit information about clausal structure (topological fields).

Being able to train extremely good classifiers with a large amount of automatic feature formation does not mean that all the insights that were previously gained in linguistics or computational linguistics is suddenly worthless.

(Nonetheless, it's an exciting time to be in NLP.)


I was rather over simplifying a tad and being conversational (and I'm not an expert, not even much beyond beginner).

It is indeed an exciting time.


> Apparently it was a surprise to the AI NLP teams that spent years doing manual classification, suddenly a Deep NN out performed them without any prior knowledge. Just make a 300 dimension vector of the occurrence frequencies of word combinations and out fall the rules of language!

Hogwash! While there is certainly some truth to what you say and how "Deep Learning" has become mainstream in NLP over the last two years, it is far from as easy as you portray it to be.

The key paradigm shift has been in the downplay (not removal, mind you) of hand-crafted features and moving away from imposing constraints on your model. State-of-the-art NLP research, in general, no longer tends to spend time coming up with new indicator features, coming up with clever constraints, or finding ways of training models that require approximation techniques to even be feasible computationally. Instead, models tend to learn in an end-to-end fashion, where manipulating the model structure is significantly easier and we now learn features as opposed to specify them by hand. This is great and something I am happy to be a part of, but, if you want state-of-the-art results it is still fairly common to mix in some "old-school" features as well, just to squeeze that very last bit of performance out of your model.

It is also not fair to say "without any prior knowledge". Even if you train a parser in the new paradigm (like Vinyals et al. (2014)), you still need to supply your model with training data describing syntactic structure, this data was largely constructed by linguists in the 90s. The same thing goes for pretty much any NLP task beyond simple lexical semantics. We also knew that distributional features were useful even before the "Deep Learning" revolution, see Turian et al. (2010) for example, where the "Deep Learning" methods of that time were defeated by an "old-school" co-occurrence clustering method from the early 90s. Heck, the whole idea of distributional semantics was alive and well throughout the early 2000s and can trace its roots back to work such as Harris (1954) and arguably even the later Wittgenstein.

Note that I am saying all of this as a "Deep Learner" that has been pushing this agenda for about four years now, and I will continue to work along these lines since I think that "Deep Learning" (or rather Representation Learning) is currently the best approach for semantics in NLP. But hype is dangerous, even if it in many ways supports my cause.


Thank you for the input. Yes I was being a bit flippant and shallow, ewll more conversational really.

You're right about hype being dangerous.



A child learns much more and more deeply about language from just a fraction of the amount of unsupervised data. The point is that the mechanisms are entirely different, it's not very useful to compare.


A couple of friends recommended these:- (Not sure if they are relevant though for deep learning specifically)

1) http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-... 2) https://www.khanacademy.org/math/linear-algebra/vectors_and_...

If anyone knows anything else (relevant to deep learning) could you please share :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: