I wonder why this is getting so few traction here.
These models seems to beat all other available open-source models easily and the Blogpost is extremely well written, with very good documentation and fine-tuning instructions.
Well done MosaicML, I am excited what comes next and will definitely test out you platform!
I'm perplexed as well. Here's a model with commercial use licensing that is competitive (better in half of the major benchmarks) with llama 7B, and has been tuned in several variants and has 2048 token width inputs.
This is BY FAR the best model of its size that is usable by businesses. I plan to start testing it out soon.
Going purely by the benchmarks from OP - you can essentially consider MPT equivalent to LLaMa. It might be better/worse depending on the specific task but not by much.
So compared to GPT3.5 - it's not great at all. That said, LLaMa showed significant improvements via fine-tuning and I expect those to apply here as well.
EDIT: Oh I forgot this is 7B. I personally haven't spent much time with 7B llama because my hardware can do 15/30B - and honestly 15B llama is very noticably better to the point where if you can run it you shouldn't bother with 7B. So this really can't compare to GPT3.5 without finetuning and even then it'll be behind (based on llama models)
These models seems to beat all other available open-source models easily and the Blogpost is extremely well written, with very good documentation and fine-tuning instructions.
Well done MosaicML, I am excited what comes next and will definitely test out you platform!