I wonder if they could be used to improve speech recognition accuracy. So you'd have two models running when someone utters a sentence: the first would generate the x most likely phrases that it thinks were spoken, and the second (the RNN) would select the highest ranked 'plausible' sentence (i.e. a sentence it would have been able to generate itself).
I guess that's a bit indirect, but these RNNs are essentially learning the 'rules' that actual phrases conform to. It'd definitely be better than trying to hard-code the rules (especially for a language like English!). And the training data is very easy to get: just feed it a few thousand ebooks, the comments section from HN etc.
Standard neural network-based speech recognition pipelines (i.e. RNN + CTC) always use a language model. Unlike a seq2seq model (or any autoregressive model, or a structured prediction output), CTC models output timesteps as conditionally independent. Hence, everyone uses an RNN LM or n-gram LM or both when retrieving probable sequences from a CTC model (e.g. with beam search).
Machine translation is a big one. See the recent NY Times feature [1] and the arxiv paper [2]. Automated image captioning is another (used extensively by Facebook).
Could you expand a bit on that. I have come across examples generating poems from Shakespeare works and realistic looking css/javascript but still trying to find out a more realistic usecase.
Say you want to translate a sentence from language A to language B. You have a system that generates 10 possible translations in language B. Now you use a language model of language B to figure out which is the best translation by asking the language model which of the translations has the highest probability of being generated by the model.
Is there any way to "translate" between writing styles of the same language? I'm thinking something analogous to the Van Gogh-ify image-processing techniques using deep convolutional networks.
Even very simple transformations, e.g., adding alliteration/assonance or adding rhymes everywhere, might be fun.
Should help you break down sentences into their semantic parts. The transformations are then made by walking the syntax tree and modifying the tagged parts of speech as you see fit.
Chat bots, image to caption, and question answering are some more realistic use cases of the generator side. In those cases there is some input (previous chat message, an image, a question) which is encoded into a vector sometimes referred to as a context or thought vector. The decoder/generator unfolds that vector into a series of words.