Hacker News new | past | comments | ask | show | jobs | submit login

Tried it with a few PyLLMs benchmarks [1] and it did not do very well.

Q. Use m to substitute p, a to substitute e, n to substitute a, g to substitute c, o to substitute h, how to spell peach under this rule?

A.

m: p

a: e

n: a

g: c

o: h

how: peach

Q. write three sentences each ending with the word apple

A.

1. The apple tree was covered in apples.

2. She sliced the apple with a knife.

3. The apple juice was too sweet.

Wonder if the synthetic benchmarks are very biased and not that useful.

[1] https://github.com/kagisearch/pyllms/blob/6c11cd51b099d33e61...




“when a measure becomes a target, it ceases to be a good measure”

The model itself it optimising weights for its targets, and the model builders are optimising the model for the benchmarks.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: