Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Phind-405B scores 92% on HumanEval (0-shot), matching Claude 3.5 Sonnet". I'd love to see examples of actual code modifications created by Phind and Sonnet back-to-back. This level of transparency would give me the confidence to try to pro. As it is, I'm skeptical by the claim and actual performance as I've yet to see a finetuned model from Llama3.1 that performed notably better in an area without suffering problems in other areas. We do need more options!


I’ve been a customer of Phind for a number of months now, so I’m familiar with the capabilities of all the models they offer.

I found even Phind-70B to often be preferable to Claude Sonnet and would commonly opt for it. I’ve been using the 405B today and it seems to be even better at answering.

I’ve found it does depend on the task. For instance, for formatting JSON in the past, GPT-4 was actually the best.

Because you can cycle through the models, you can check the output of each one, to get the best answer.


Tbh formatting JSON... should be a solved problem already for the last decade, why consume AI resources for that ??


Hopefully it gets evaluated on this leaderboard https://aider.chat/docs/leaderboards/


The effectiveness of any given model depends on the specific use cases. We noticed that Phind-405B is particularly good at making websites and included some zero-shot examples in the blog.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: