Hacker News new | past | comments | ask | show | jobs | submit | saurkt's comments login

Turing team member here. You can also try the demo at the following link which operates at pixel level for images (no meta-data which is how similar systems like search engines work). Another cool thing is that the model does inherent OCR without explicit training or inference set-up for OCR. https://turing.microsoft.com/bletchley

Happy to answer any questions.


(Team member of this project) Just a clarification, both Microsoft and Nvidia have ownership of this model. Here is the Microsoft version of same announcement.

https://www.microsoft.com/en-us/research/blog/using-deepspee...


A similar feature of soft Ctrl+F is available in the web version of Microsoft Word.

https://mspoweruser.com/find-feature-in-microsoft-word-whole...

Disclaimer: Microsoft employee


Yes, these things keep us up at night as well :-).


If you want access, please send an email to [turing_ AT _microsoft _DOT_ com]. Remove underscores and spaces.


We don't have an exact date, but, we plan to share more details in a later submission. If you want access, please send an email to [turing_ AT _microsoft _DOT_ com]. Remove underscores and spaces.


One of the team members from Project Turing. Happy to answer any questions.


“We are releasing a private demo of T-NLG, including its freeform generation, question answering, and summarization capabilities, to a small set of users within the academic community for initial testing and feedback.”

What’s the deal with these private demos? (GPT-2 was also essentially private). More importantly, why even announce the existence of a private demo to people who were not invited?


I'm honestly not trolling with this question, but can you explain what the practical applications of text generation are? From what I've seen of GPT-2, it's a cool toy, but I have never seen it create anything that seems like it would be useful to solve a problem (eg, a human-computer interaction problem).

The only applications I can think of for text generation are malevolent ones: I'm sure it would be great at generating spam sites which can fool Google's PageRank algorithms, and it seems like you could easily use it in an information warfare / astroturf setting where you could generate the illusion of consensus by arming a lot of bots with short, somewhat convincing opinions about a certain topic.

Is there something obvious I'm missing? It seems too imprecise to actually deliver meaningful information to an end-user, so I'm frankly baffled as to what its purpose is.


Why the lack of number on the more popular SQuAD and Glue benchmarks?


SQUAD and GLUE are tasks for language representation models -- aka BERT-like. This is a language generation model -- GPT-like. Hence, SQUAD/GLUE test sets are not really applicable. We are reporting on the wikitext and lambada sets that openAI also uses for similar models (numbers are in the blogpost).


What's the difference between the two models?


* BERT & language representation models: They basically turn a sentence into a compact vector that represents it so you can then do some downstream task on it such as sentiment detection, or matching the similarity between two sentences etc.

* GPT & language generation models: Given some context (say a sentence), they can generate text to complete it, or to summarize it, etc. The task here is to actually write something.


Both are language representation models, text generation is just a way of training model. BERT is also trained on text generation task: it asked to fill gaps in text (15% of text is blanked during training).


Maybe I am not understanding your point.

Out of the box, given a sequence of n tokens, BERT returns a tensor of dimension (n_tokens, hidden_size) [1]. Where hidden size has no relationship with the vocabulary. You can then fine-tune a model on this representation to do various tasks, e.g. sentiment classification. Thus BERT is said to be a language representation model.

Out of the box, given a sequence, GPT-2 returns a distribution over the vocabulary [2] from which you can draw to find the most likely next word. Thus GPT-2 is said to be a language generation model.

You could of course play with the masking token of BERT call it recursively to force BERT to generate something, and you could chop off some layers of GPT-2 to get some representation of your input sequence, but I think that is a little past the original question.

[1] https://github.com/google-research/bert/blob/master/modeling...

[2] https://github.com/openai/gpt-2/blob/master/src/model.py#L17...


> BERT returns a tensor of dimension (n_tokens, hidden_size) [1]. Where hidden size has no relationship with the vocabulary

"BERT returns" is ambiguous here. During pretraining last layer is loggits for one hot vocab vector, the same as in GPT: https://github.com/google-research/bert/blob/master/run_pret...


One is a language generation model, the other is a fill-in-the-blank model. It sounds like they might be similar, but in practice they are different enough objectives (and in particular the "bi-directional" aspect of BERT-type models) that the models learn different things.


Have you evaluated against the AI2 Leaderboard benchmarks? https://leaderboard.allenai.org/


Not yet. We will try to run against those benchmarks soon.


How does it compare to Google’s BERT and do you have an online demo?

Here’s a demo of BERT https://www.pragnakalp.com/demos/BERT-NLP-QnA-Demo/


(Similar to the response for another question.) BERT is a language representation model while Turing-NLG is a language generation model (similar to GPT). They are not directly comparable (they can potentially be massaged to mimic the other, but, not something that we have done yet.)


Google's T5 paper pretty convincingly combines the two doesn't it?


Any plans on training other (non nlp) huge models using ZeRO?

Specifically for Transformers - any plans to train a big model with a bigger context window?

Not that this one isn't very impressive, of course.


Thanks for your kind words. Yes, we would like to next train a language representation model. And our hunch is that probably something which is a mixture of language representation and language generation would be able to get the best of both worlds.


How close do you think the technology is to answering -this- question?


1) How close do you think the technology is to answering -this- question?

Four days!

2) How long in years?

Three years!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: