Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not the most technically sound guy. But this sort of experiment would've entailed running on a VM if it were up to me. Especially being aware of the Replit incidence the author refers to. Tsk.

Throw a trick task at it and see what happens. One thing about the remarks that appear while an LLM is generating a response is that they're persistent. And eager to please in general.

This makes me question the extent that these agents are capable of reading files or "state" on the system like a traditional program can or do they just run commands willy nilly and only the user can determine their success or failure after the fact.

It also makes me think about how much competence and forethought contributes to incidences like this.

Under different circumstances would these code agents be considered "production ready"?



I hate to blame the victim, but did the author not use the built-in sandbox (`gemini —sandbox`) or git?


The author didn't use --nosandbox.

Why is the default broken?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: