Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn't asking an LLM "to write a Python validator" suffers from the 99.9% (or whatever the error rate for validators written by Claude) problem?


The difference is that you're asking it to perform one intellectual task (write a program) instead of 100 menial tasks (parse a file). To the LLM the two are the same level of complexity, so performing less work means less possibility of error.

Also, the LLM is more likely to fail spectacularly by hallucinating APIs when writing a script, and more likely to fail subtly on parsing tasks.


In addition to what you say, it can also be easier for a (appropriately-skilled) human to verify a small program than to verify voluminous parsing output, plus, as you say, there's the semi-automated "verification" of a very-wrong program failing to execute.


All tests have this problem. We still write them for the same reasons we do double-entry bookkeeping.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: