> About AI Evals Maybe it's obvious to some - but I was hoping that page started...

phren0logy · 2025-07-03T14:33:08 1751553188

Here's another article by the same author with more background on AI Evals: https://hamel.dev/blog/posts/evals/

I've appreciated Hamel's thinking on this topic.

xpe · 2025-07-03T16:51:32 1751561492

From that article:

> On a related note, unlike traditional unit tests, you don’t necessarily need a 100% pass rate. Your pass rate is a product decision, depending on the failures you are willing to tolerate.

Not sure how I feel about this, given expectations, culture, and tooling around CI. This suggestion seems to blur the line between a score from an eval and the usual idea of a unit test.

P.S. It is also useful to track regressions on a per-test basis.

ethan_smith · 2025-07-03T23:09:27 1751584167

AI Evals are systematic frameworks for measuring LLM performance against defined benchmarks, typically involving test cases, metrics, and human judgment to quantify capabilities, identify failure modes, and track improvements across model versions.