"Phind-405B scores 92% on HumanEval (0-shot), matching Claude 3.5 Sonnet". I'd l...

Simorgh · on Sept 5, 2024

I’ve been a customer of Phind for a number of months now, so I’m familiar with the capabilities of all the models they offer.

I found even Phind-70B to often be preferable to Claude Sonnet and would commonly opt for it. I’ve been using the 405B today and it seems to be even better at answering.

I’ve found it does depend on the task. For instance, for formatting JSON in the past, GPT-4 was actually the best.

Because you can cycle through the models, you can check the output of each one, to get the best answer.

xwolfi · on Sept 6, 2024

Tbh formatting JSON... should be a solved problem already for the last decade, why consume AI resources for that ??

trees101 · on Sept 5, 2024

Hopefully it gets evaluated on this leaderboard https://aider.chat/docs/leaderboards/

rushingcreek · on Sept 5, 2024

The effectiveness of any given model depends on the specific use cases. We noticed that Phind-405B is particularly good at making websites and included some zero-shot examples in the blog.