I'm not the most technically sound guy. But this sort of experiment would've entailed running on a VM if it were up to me. Especially being aware of the Replit incidence the author refers to. Tsk.
Throw a trick task at it and see what happens. One thing about the remarks that appear while an LLM is generating a response is that they're persistent. And eager to please in general.
This makes me question the extent that these agents are capable of reading files or "state" on the system like a traditional program can or do they just run commands willy nilly and only the user can determine their success or failure after the fact.
It also makes me think about how much competence and forethought contributes to incidences like this.
Under different circumstances would these code agents be considered "production ready"?
Throw a trick task at it and see what happens. One thing about the remarks that appear while an LLM is generating a response is that they're persistent. And eager to please in general.
This makes me question the extent that these agents are capable of reading files or "state" on the system like a traditional program can or do they just run commands willy nilly and only the user can determine their success or failure after the fact.
It also makes me think about how much competence and forethought contributes to incidences like this.
Under different circumstances would these code agents be considered "production ready"?