Yep, came here expecting to read an interesting take on why SSE sucks or a better alternative, but this just reads like "skill issue." A term I very much dislike but seems appropriate here.
Significant part of relatively new technology stacks and tech slang is “skill issue”. A lot of problems were already solved or at least analyzed 40-20 years ago and hardly need to be re-invented, maybe just modernized.
You're right that this isn't the "autonomous agent" fantasy that keeps getting hyped.
The agentic part here is more modest but real. The primary agent does make runtime decisions about task decomposition based on the data and calls the subagents (tools) to do the actual work.
So yeah, it's closer to "intelligent workflow orchestration." That's probably a more honest description.
I assume you're talking about Claude Code, right? If so, I very much agree with this. A lot of this was actually inspired by how easy it was to do in Claude Code.
I first experimented with allowing the main agent have a "conversation" with sub-agents. For example, I created a database of messages between the main agent and the sub-agents, and allowed both append to it. This kinda worked for a few messages but kept getting stuck on mid-tier models, such as GPT-5 mini.
But from my understanding, their implementation is also similar to the stateless functions I described. (happy to be proven wrong). Sub agents don't communicate back much aside from the final result, and they don't have a conversation history.
The live updates you see are mostly the application layer updating the UI which initially confused me.
Sure, but to clarify, so you are probably setting temperature to close to 0 in order to try to get as consistent output as possible based on the input? Have you made any changes to top k and/or top p that you have found makes agents output more consistent/deterministic?
For context, I'm a solo developer building UserJot. I've been recently looking deeper into integrating AI into the product but I've been wanting to go a lot deeper than just wrapping a single API call and calling it a day.
So this blog post is mostly my experience trying to reverse engineer other AI agents and experimenting with different approaches for a bit.
When you discuss caching, are you talking about caching the LLM response on your side (what I presume) or actual prompt caching (using the provider cache[0])? Curious why you'd invalidate static content?
I think I need to make this a bit more clear. I was mostly referring to caching the tools (sub-agents) if they are a pure function. But that may be a bit too speicific for the sake of this post.
i.e. you have a query that reads data that doesn't change often, so you can cache the result.
Nice post! Can you share a bit more about what variety of tasks you've used agents for? Agents can mean so many different things depending on who you're talking to. A lot of the examples seem like read-only/analysis tasks. Did you also work on tasks where agent took actions and changed state? If yes, did you find any differences in the patterns that worked for those agents?
Sure! So there are both read-only and write-only agents that I'm working on. Basically there's a main agent (main LLM) that is responsible for the overall flow (currently testing GPT-5 Mini for this) and then there are the sub-agents, like I mentioned, that are defined as tools.
Hopefully this isn't against the terms here, but I posted a screenshot here of how I'm trying to build this into the changelog editor to allow users to basically go:
1. What tickets did we recently close?
2. Nice, write a changelog entry for that.
3. Add me as author, tags, and title.
4. Schedule this changelog for monday morning.
Of course, this sounds very trivial on the surface, but it starts to get more complex when you think about how to do find and replace in the text, how to fetch tickets and analyze them, how to write the changelog entry, etc.
- Did you build your own or are you farming out to say Opencode?
- If you built your own, did you roll from scratch or use a framework? Any comments either way on this?
- How "agentic" (or constrained as the case may be) are your agents in terms of the tools you've provided them?
Not sure if I understand the question, but I'll do my best to answer.
I guess Agents/Agentic are too broad of a term. All of this is really an LLM that has a set of tools that may or may not be other LLMs. You don't really need a framework as long as you can make HTTP calls to openrouter or some other provider and handle tool calling.
I'm using the AI sdk as it plays very nicely with TypeScript and gives you a lot of interesting features like handling server-side/client-side tool calling and synchronization.
My current setup has a mix of tools, some of which are pure functions (i.e. database queries), some of which handle server-side mutations (i.e. scheduling a changelog), and some of which are supposed to run locally on the client (i.e. updating TipTap editor).
Again, hopefully this somewhat answers the question, but happy to provide more details if needed.
When you describe subagents, are those single-tool agents, or are they multi-tool agents with their own ability to reflect and iterate? (i.e. how many actual LLM calls does a subagent make?)
So I have a main agent that is responsible for streering the overall flow, and then there are the sub-agents that, as I mentioned, are stateless functions that are called by the main agent.
Now these could be anything really: API calls, pure computation, or even LLM calls.
- Why javascript and not typescript? You are missing bun and deno, what's up with that?
- Is your intern ok?
- I don't have a need for enterprise solution but still would like paid-support, are you planning a 3rd tier in-between the free tier and the enterprise one?