Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here's another article by the same author with more background on AI Evals: https://hamel.dev/blog/posts/evals/

I've appreciated Hamel's thinking on this topic.



From that article:

> On a related note, unlike traditional unit tests, you don’t necessarily need a 100% pass rate. Your pass rate is a product decision, depending on the failures you are willing to tolerate.

Not sure how I feel about this, given expectations, culture, and tooling around CI. This suggestion seems to blur the line between a score from an eval and the usual idea of a unit test.

P.S. It is also useful to track regressions on a per-test basis.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: