I'm not the most technically sound guy. But this sort of experiment would've ent... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		tolerance 7 months ago \| parent \| context \| favorite \| on: I watched Gemini CLI hallucinate and delete my fil... I'm not the most technically sound guy. But this sort of experiment would've entailed running on a VM if it were up to me. Especially being aware of the Replit incidence the author refers to. Tsk. Throw a trick task at it and see what happens. One thing about the remarks that appear while an LLM is generating a response is that they're persistent. And eager to please in general. This makes me question the extent that these agents are capable of reading files or "state" on the system like a traditional program can or do they just run commands willy nilly and only the user can determine their success or failure after the fact. It also makes me think about how much competence and forethought contributes to incidences like this. Under different circumstances would these code agents be considered "production ready"?

greymalik 7 months ago [–]

I hate to blame the victim, but did the author not use the built-in sandbox (`gemini —sandbox`) or git?

lupire 7 months ago | [–]

The author didn't use --nosandbox.

Why is the default broken?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact