Implementing a CNN for Text Classification in Tensorflow

cogware · on Dec 12, 2015

Great tutorial - Well written and good patterns for TensorFlow usage, e.g. checkpointing, name scopes for cleaning up the graph visualization, and using summaries/TensorBoard, and also nice explanations of the concepts.

Though I'm curious why you used VALID padding not SAME for the conv layers? It seems like it would be simpler to use SAME.

Also, minor nit: TensorFlow and TensorBoard should both have two letters capitalized

dennybritz · on Dec 12, 2015

Thanks! I think VALID and SAME are probably giving the same results. The reason I used VALID is only because the original paper seems to be doing that as well.

I will fix the capitalization!

primaryobjects · on Dec 12, 2015

Looks neat. Why did you bother using <PAD> words to have sentences be the same length, when you're using a bag-of-words (document-term matrix) model anyway?

Each sentence vector ends up being the length of the vocabulary, so they're already the same length. You can probably drop step #3 in this case.

dennybritz · on Dec 12, 2015

Hi! It is not using a BoW model. Each input sentence is a vector of size [sentence_length] (or, in theory, a matrix of size [vocab_size, sentence_length] with one-hot vectors) so the padding is required.

There is a way to do it without padding, but it's less efficient from a training point of view. You could instantiate a new network for each possible sentence length then share the paramaters between them, and then batch based on your sentence length.

Also, the padding isn't striclty necessary in theory. The feature vector will always end up being the same length, regardless of sentence length, due to the pooling layer. However, Tensorflow forces you to specify the exact size of the pooling operation (you can't just say "pool over the full input"), so you need it if you're using TF.

nl · on Dec 13, 2015

This looks pretty nice. It's worth pointing to the seq2seq TensorFlow example which covers a lot of similar topics.

Is there an example anywhere of how to initilize from the word2vec embeddings?

dennybritz · on Dec 13, 2015

I'm not sure if there is an example somewhere in the TF docs, but initializing a variable is pretty easy. All you need to do is:

session.run(W.assign(numpy_word2_vec_matrix)). W would the embedding matrix created in first layer of the code. [1]

Of course you'd first need to load word2vec and filter its vocabulary to match your own vocabulary. That's most of the code and not specific to TensorFlow. You could use gensim [2] for that.

[1] https://www.tensorflow.org/versions/master/api_docs/python/s...

[2] https://radimrehurek.com/gensim/index.html

nl · on Dec 13, 2015

BTW, your CNN for NLP post is interesting too. You might find LSTM-BASED DEEP LEARNING MODELS FOR NON- FACTOID ANSWER SELECTION[1] (from the IBM Watson team) interesting.

They combine a CNN with a LSTM for question answering on complex, non-factoid questions. Their LSTM+Attention model performs slightly better, but it's a pretty interesting approach.

[1] http://arxiv.org/abs/1511.04108

nl · on Dec 13, 2015

Thanks.