Hacker News new | past | comments | ask | show | jobs | submit login

This looks interesting, how do you plan to handle agents which operate apps with a UI - for example playwright, obsidian etc. Or is this out of scope?





Thanks!

That's a good question. Currently, there is one way to do it. The client querying the agent receives JSON-encoded values that are returned from plugin function calls made by the agent. These values are received alongside the agent token response stream (via SSE). So plugins can essentially emit events that the client can forward to the UI application, such as to click a button etc. The limitation with this is that there is no built-in way to send a success/error status back, it's one way only. It works well for actions that are infallible such as simple UI actions.

The client here would also need a way to interact with the target program of course, e.g. from a JavaScript browser you can click buttons and manipulate the DOM, or from a VSCode Plugin you can interact with the editor etc.

It's definitely something that can be improved though! I've been thinking about some type of MCP interoperability that could maybe assist with this.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: