Hacker News new | past | comments | ask | show | jobs | submit login

This bit about EURISKO exploiting its own heuristic scoring mechanism is cute:

> One of the first heuristics that EURISKO synthesized (H59) quickly attained nearly the highest Worth possible (999). Quite excitedly, we examined it and could not understand at first what it was doing that was so terrific. We monitored it carefully, and finally realized how it worked: whenever a new conjecture was made with high worth, this rule put its own name down as one of the discoverers! It turned out to be particularly difficult to prevent this generic type of finessing of EURISKO'S evaluation mechanism. Since the rules had full access to EURISKO'S code, they would have access to any safeguards we might try to implement. We finally opted for having a small 'meta-level' of protected code that the rest of the system could not modify.




It's not just cute, it's an open problem in the theory of recursively self-modifying AI design--there's no general solution to an AI hijacking its own reward channel, so far. https://www.lesswrong.com/posts/upLot6eG8cbXdKiFS/reward-fun... is the most recent thing I know of in the area.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: