Microsoft: invests 10 billion in company. Also Microsoft: here's the tools you need to DIY one of the premium features the company we just invested 10 billion in for free.
Not that reproducing GPT-4 is going to be easy with this, but it'll definitely get rid of some major hurdles. I read a report about the difficulties HuggingFace had with producing their Bloom model, and a lot of it was the sort of straight forward systems engineering that goes into tooling like this.
Is the Bloom model considered a failure by the community? If you read the introduction it was supposed to include improvements over GPT3, but it performs much worse, I guess because of lower quality training data? I wonder what sort of company would have high enough quality data that they could use this project to fine tune a public model to the point where it would be better in some scenario than plain old GPT4 would be. Especially when you can just inject extra info in to the GPT4 prompt, like phind does for example. What even is the use of fine tuning given GPT 4 exists?
> Microsoft: invests 10 billion in company. Also Microsoft: here's the tools you need to DIY one of the premium features the company we just invested 10 billion in for free.
In my mind, MSFT spent that money to acquire a head start on getting LLM-style capabilities into MS’s profitable product portfolio. This is money well spent:
1. MSFT can and will make money on these capabilities.
2. If MSFT didn’t do this, they would take the substantial risk of someone else pulling it off and attacking their moat.
I can’t really imagine today’s Google pulling this off with Google Docs. Adobe doesn’t target MS’s market directly enough to be an immediate risk. Apple doesn’t seem interested in competing with MS. Meta is doing its own thing in the corner. But someone really could attack MS with something amazing and make short-term sales, which could turn into a long term loss for MS. (Salesforce? They don’t seem able make things that normal people want to use.). But MS is now ahead of the curve, and they didn’t really spend that much money to get there.
Keep in mind that LibreOffice is not vastly less capable than Office 365, and Teams is not exactly an insurmountably strong piece of technology.
> In my mind, MSFT spent that money to acquire a head start on getting LLM-style capabilities into MS’s profitable product portfolio. This is money well spent:
personally I think in 10 years people will joke about the machine generated boilerplate in the same way they joke about Clippy today
My point is that MS’s investment in OpenAI may be a good deal for MS regardless of what happens to the valuation of OpenAI-the-company.
The LLM space is moving fast. OpenAI may stay on top for a long time, or it may not. But I expect Microsoft’s use of LLMs to be valuable for MS and likely market-leading in the office AI space for quite some time regardless of what happens to OpenAI.
Microsoft: invests 10 billion in company. Also Microsoft: here's the tools you need to DIY one of the premium features the company we just invested 10 billion in for free.
The idea is to get the people who arent willing to pay hooked on what they offer. Once you are used to a system you will probably want the same thing at your workplace, where they can charge a a prumium. Same thing was done with windows in asia.
> Microsoft: invests 10 billion in company. Also Microsoft: here's the tools you need to DIY one of the premium features the company we just invested 10 billion in for free.
This seems like more evidence that under the "commoditize your complement" framework, all intellectual property is the complement, and the only thing actually worth selling for Microsoft is subscriptions and server time.
Yeah. Most of what is valuable to me about GPT-4 is its reasoning ability, not fact recall or writing quality. Fact recall has been mostly solved by Google search cards for years, and writing quality is not the most important thing now that I'm no longer a freelance writer, GPT-3.5 and some of the good OS models like Koala produce okay writing quality.
What nothing else can provide is something that will reason intelligently with the data you give it, with similar or better quality to paying for something like MTurk, for much cheaper and nearly instant delivery. That reasoning ability comes from the model size and training data quality, and in real applications using CoT, LangChain etc a lot of it comes from the context length. 8k is better than anything else I've tried at real use cases, and I very much want to try 32k because that opens up a lot of space to do new things (e.g. dump in a textbook on the domain you want the model to reason about). I want even longer context lengths than that too, but we'll have to see how it develops. From what I understand context length/block size is a pretty pure relationship to the amount of compute and memory they're willing to devote during training. RWKV's architectural changes may shake that up a bit, we'll see when Stability releases it.
>>>Microsoft: invests 10 billion in company. Also Microsoft: here's the tools you need to DIY one of the premium features the company we just invested 10 billion in for free.
> With just one click, you can train, generate and serve a 1.3 billion parameter ChatGPT model within 1.36 hours on a single consumer-grade NVIDIA A6000 GPU with 48GB memory. On a single DGX node with 8 NVIDIA A100-40G GPUs, DeepSpeed-Chat enables training for a 13 billion parameter ChatGPT model in 13.6 hours. On multi-GPU multi-node systems (cloud scenarios),i.e., 8 DGX nodes with 8 NVIDIA A100 GPUs/node, DeepSpeed-Chat can train a 66 billion parameter ChatGPT model under 9 hours. Finally, it enables 15X faster training over the existing RLHF systems
> The following are some of the open-source examples that are powered by DeepSpeed: Databricks Dolly, LMFlow, CarperAI-TRLX, Huggingface-PEFT
(disclaimer: MSFT/GH employee, not affiliated with this project)
Is it? It might have even more computing power, but are cards now able to share VRAM now? My hands-on experience of all this is from a few years ago and I think it was not possible back then.
I can see the future where each company will have "assistant AI model" trained/updated on its internal data at periodic intervals. The data sources could be group emails, slack / team messages, docs, company pdfs and so on. Maybe MS will provide it to you since it already has access to many of the data sources.
No, you can see the time breakdown in the table under the "coffee break" quote: it is the time for the 3-step RLHF process only. Training a 1.3B parameter model from scratch is still a very large undertaking.
FYI they don't compare to trlX bc trlX is roughly just as fast. Similarly, they put trl in the worst light possible (trl is actually must faster than they claim.)
To use RLHF you need a dataset that includes instructions with good & bad answers - do many of those exist? I know there are a few datasets of just plain instructions-with-responses, but I'm not aware of any that have both good and bad (or ranked) responses. Is that trivial, or an important missing element here?
All of the UX interface have little up/down thumb icons... that's where the boolean feedback comes from. If people stop using that, sentiment analysis on the human responses will likely go a long way.
If I understood correctly, the OpenAssistant team wants to open-source their community built RLHF dataset.
On the other hand, if you're being cheeky, I bet there's a way to datamine from websites like ShareGPT and profit off shared ChatGPT <> User interactions.
I do the same. Paste some text, ask for summary, then expand summary up to my knowledge level, then examples and analogies to get me comfortably beyond my level of understanding. It's pretty great.
Does the RLHF help with training a LLM model to produce better (more accurate) results for a particular problem domain (eg. Customer support for a particular company) or is it only helpful in training the LLM to be a chat agent in general or a chat agent with guard rails?
RLHF helps with getting the model in the "mood" to output responses in a certain style that is found to be helpful by users. i.e how to write a poem, or email.
It doesn't increase it's knowledge of the world or increase it's capabilities.
This is a really cool step but as someone without the suggested GPU, it isn't easy or one click for me yet.
I am hoping that someone makes a very simple Jupyter notebook where I can enter my RLHF file and select a few other settings and just run (on AWS or Azure; willing to pay per fine-tuned model say $100-$500 for cloud credits + notebook access).
In case anyone else is curious, this yt video helps me understand how Azure ML works and I think with this understanding, combined with the Github read me, it will be doable. https://www.youtube.com/watch?v=yBVXR8G8Bg8
Not that reproducing GPT-4 is going to be easy with this, but it'll definitely get rid of some major hurdles. I read a report about the difficulties HuggingFace had with producing their Bloom model, and a lot of it was the sort of straight forward systems engineering that goes into tooling like this.
Is the Bloom model considered a failure by the community? If you read the introduction it was supposed to include improvements over GPT3, but it performs much worse, I guess because of lower quality training data? I wonder what sort of company would have high enough quality data that they could use this project to fine tune a public model to the point where it would be better in some scenario than plain old GPT4 would be. Especially when you can just inject extra info in to the GPT4 prompt, like phind does for example. What even is the use of fine tuning given GPT 4 exists?