MosaicML MPT-7B: A Commercially-Usable LLaMa-Quality Model

jpdus · on May 5, 2023

I wonder why this is getting so few traction here.

These models seems to beat all other available open-source models easily and the Blogpost is extremely well written, with very good documentation and fine-tuning instructions.

Well done MosaicML, I am excited what comes next and will definitely test out you platform!

deepsquirrelnet · on May 5, 2023

I'm perplexed as well. Here's a model with commercial use licensing that is competitive (better in half of the major benchmarks) with llama 7B, and has been tuned in several variants and has 2048 token width inputs.

This is BY FAR the best model of its size that is usable by businesses. I plan to start testing it out soon.

djoldman · on May 5, 2023

Looks competitive with LLaMa:

https://assets-global.website-files.com/61fd4eb76a8d78bc0676...

meghan_rain · on May 5, 2023

what about gpt3.5? i know it's worse but how much?

thewataccount · on May 5, 2023

Going purely by the benchmarks from OP - you can essentially consider MPT equivalent to LLaMa. It might be better/worse depending on the specific task but not by much.

So compared to GPT3.5 - it's not great at all. That said, LLaMa showed significant improvements via fine-tuning and I expect those to apply here as well.

EDIT: Oh I forgot this is 7B. I personally haven't spent much time with 7B llama because my hardware can do 15/30B - and honestly 15B llama is very noticably better to the point where if you can run it you shouldn't bother with 7B. So this really can't compare to GPT3.5 without finetuning and even then it'll be behind (based on llama models)

bestcoder69 · on May 5, 2023

This has been fine-tuned! Chat, instruction, and long-fiction gen.

ml_hardware · on May 5, 2023

The repo for training and finetuning this model is open source here: https://github.com/mosaicml/llm-foundry

vsroy · on May 5, 2023

This has a context window of 65K for the storywriter version.

ftxbro · on May 5, 2023

How can I run some inference with this model locally? Do I have to make a huggingface account?

edtechdev · on May 6, 2023

I'm not sure, but maybe GPT4All will eventually add support for it: https://gpt4all.io/

fswd · on May 6, 2023

if you can't figure it out send me an email and I can help you figure it out