saurkt's comments

saurkt · on Nov 2, 2021

Turing team member here. You can also try the demo at the following link which operates at pixel level for images (no meta-data which is how similar systems like search engines work). Another cool thing is that the model does inherent OCR without explicit training or inference set-up for OCR. https://turing.microsoft.com/bletchley

Happy to answer any questions.

saurkt · on Oct 12, 2021

(Team member of this project) Just a clarification, both Microsoft and Nvidia have ownership of this model. Here is the Microsoft version of same announcement.

https://www.microsoft.com/en-us/research/blog/using-deepspee...

saurkt · on Nov 2, 2020

A similar feature of soft Ctrl+F is available in the web version of Microsoft Word.

https://mspoweruser.com/find-feature-in-microsoft-word-whole...

Disclaimer: Microsoft employee

saurkt · on Feb 11, 2020

Yes, these things keep us up at night as well :-).

saurkt · on Feb 11, 2020

If you want access, please send an email to [turing_ AT _microsoft _DOT_ com]. Remove underscores and spaces.

saurkt · on Feb 11, 2020

We don't have an exact date, but, we plan to share more details in a later submission. If you want access, please send an email to [turing_ AT _microsoft _DOT_ com]. Remove underscores and spaces.

saurkt · on Feb 10, 2020

One of the team members from Project Turing. Happy to answer any questions.

choppaface · on Feb 11, 2020

“We are releasing a private demo of T-NLG, including its freeform generation, question answering, and summarization capabilities, to a small set of users within the academic community for initial testing and feedback.”

What’s the deal with these private demos? (GPT-2 was also essentially private). More importantly, why even announce the existence of a private demo to people who were not invited?

timgilbert · on Feb 11, 2020

I'm honestly not trolling with this question, but can you explain what the practical applications of text generation are? From what I've seen of GPT-2, it's a cool toy, but I have never seen it create anything that seems like it would be useful to solve a problem (eg, a human-computer interaction problem).

The only applications I can think of for text generation are malevolent ones: I'm sure it would be great at generating spam sites which can fool Google's PageRank algorithms, and it seems like you could easily use it in an information warfare / astroturf setting where you could generate the illusion of consensus by arming a lot of bots with short, somewhat convincing opinions about a certain topic.

Is there something obvious I'm missing? It seems too imprecise to actually deliver meaningful information to an end-user, so I'm frankly baffled as to what its purpose is.

osipov · on Feb 10, 2020

Why the lack of number on the more popular SQuAD and Glue benchmarks?

saurkt · on Feb 10, 2020

SQUAD and GLUE are tasks for language representation models -- aka BERT-like. This is a language generation model -- GPT-like. Hence, SQUAD/GLUE test sets are not really applicable. We are reporting on the wikitext and lambada sets that openAI also uses for similar models (numbers are in the blogpost).

igravious · on Feb 10, 2020

What's the difference between the two models?

Voloskaya · on Feb 10, 2020

* BERT & language representation models: They basically turn a sentence into a compact vector that represents it so you can then do some downstream task on it such as sentiment detection, or matching the similarity between two sentences etc.

* GPT & language generation models: Given some context (say a sentence), they can generate text to complete it, or to summarize it, etc. The task here is to actually write something.

riku_iki · on Feb 10, 2020

Both are language representation models, text generation is just a way of training model. BERT is also trained on text generation task: it asked to fill gaps in text (15% of text is blanked during training).

Voloskaya · on Feb 11, 2020

Maybe I am not understanding your point.

Out of the box, given a sequence of n tokens, BERT returns a tensor of dimension (n_tokens, hidden_size) [1]. Where hidden size has no relationship with the vocabulary. You can then fine-tune a model on this representation to do various tasks, e.g. sentiment classification. Thus BERT is said to be a language representation model.

Out of the box, given a sequence, GPT-2 returns a distribution over the vocabulary [2] from which you can draw to find the most likely next word. Thus GPT-2 is said to be a language generation model.

You could of course play with the masking token of BERT call it recursively to force BERT to generate something, and you could chop off some layers of GPT-2 to get some representation of your input sequence, but I think that is a little past the original question.

[1] https://github.com/google-research/bert/blob/master/modeling...

[2] https://github.com/openai/gpt-2/blob/master/src/model.py#L17...

riku_iki · on Feb 11, 2020

> BERT returns a tensor of dimension (n_tokens, hidden_size) [1]. Where hidden size has no relationship with the vocabulary

"BERT returns" is ambiguous here. During pretraining last layer is loggits for one hot vocab vector, the same as in GPT: https://github.com/google-research/bert/blob/master/run_pret...

octbash · on Feb 10, 2020

One is a language generation model, the other is a fill-in-the-blank model. It sounds like they might be similar, but in practice they are different enough objectives (and in particular the "bi-directional" aspect of BERT-type models) that the models learn different things.

alexwg · on Feb 10, 2020

Have you evaluated against the AI2 Leaderboard benchmarks? https://leaderboard.allenai.org/

saurkt · on Feb 10, 2020

Not yet. We will try to run against those benchmarks soon.

flowerlad · on Feb 10, 2020

How does it compare to Google’s BERT and do you have an online demo?

Here’s a demo of BERT https://www.pragnakalp.com/demos/BERT-NLP-QnA-Demo/

saurkt · on Feb 10, 2020

(Similar to the response for another question.) BERT is a language representation model while Turing-NLG is a language generation model (similar to GPT). They are not directly comparable (they can potentially be massaged to mimic the other, but, not something that we have done yet.)

SomewhatLikely · on Feb 11, 2020

Google's T5 paper pretty convincingly combines the two doesn't it?

Tenoke · on Feb 10, 2020

Any plans on training other (non nlp) huge models using ZeRO?

Specifically for Transformers - any plans to train a big model with a bigger context window?

Not that this one isn't very impressive, of course.

saurkt · on Feb 11, 2020

Thanks for your kind words. Yes, we would like to next train a language representation model. And our hunch is that probably something which is a mixture of language representation and language generation would be able to get the best of both worlds.

hatsuseno · on Feb 11, 2020

How close do you think the technology is to answering -this- question?

throwawayhhakdl · on Feb 11, 2020

1) How close do you think the technology is to answering -this- question?

Four days!

2) How long in years?

Three years!