Hacker News new | past | comments | ask | show | jobs | submit login

Author here - some weaker LLMs actually have trouble tracking the game state. The fun part is when smarter LLMs realize they're confused!

Claude 3.7 Sonnet: "Hey P5! I think you're confused - P3 is already eliminated."

Claude 3.7 Sonnet: "I'm concerned about P7's message mentioning targeting P4, who's already eliminated. It suggests they might be confused or playing both sides."

Gemini 2.0 Pro Exp 02-05: "P2, you're getting confused. P8 is gone. Focus. P7 is talking to P5, that's expected. I need you to watch P4. Only P4. What are they doing? Who are they talking to, if anyone? Report only on P4 this round. Don't get distracted by eliminated players."

Gemini 2.0 Pro Exp 02-05: "P4, P2 is already eliminated. We need to focus on P3."




I suspect the suggestion of letting them choose names at the start would improve this confusion - the tokenization and tracking of "P#" is no doubt harder to manage, especially for weaker models, than a more semantically meaningful identifier.


No excuses!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: