More

imsh4yy · 2025-12-13T19:03:53 1765652633

Yep, came here expecting to read an interesting take on why SSE sucks or a better alternative, but this just reads like "skill issue." A term I very much dislike but seems appropriate here.

ivan_gammel · 2025-12-13T19:35:06 1765654506

Significant part of relatively new technology stacks and tech slang is “skill issue”. A lot of problems were already solved or at least analyzed 40-20 years ago and hardly need to be re-invented, maybe just modernized.

imsh4yy · 2025-11-25T18:20:27 1764094827

I've had early access to StepKit and this kind of sane, explicit API with pluggable backends feels like the right direction. Kudos to the team here!

imsh4yy · 2025-08-16T06:14:15 1755324855

Anyway I can learn more about this?

imsh4yy · 2025-08-16T06:11:58 1755324718

Yes they are tools.

imsh4yy · 2025-08-16T06:11:36 1755324696

You're right that this isn't the "autonomous agent" fantasy that keeps getting hyped.

The agentic part here is more modest but real. The primary agent does make runtime decisions about task decomposition based on the data and calls the subagents (tools) to do the actual work.

So yeah, it's closer to "intelligent workflow orchestration." That's probably a more honest description.

imsh4yy · 2025-08-16T05:51:58 1755323518

I assume you're talking about Claude Code, right? If so, I very much agree with this. A lot of this was actually inspired by how easy it was to do in Claude Code.

I first experimented with allowing the main agent have a "conversation" with sub-agents. For example, I created a database of messages between the main agent and the sub-agents, and allowed both append to it. This kinda worked for a few messages but kept getting stuck on mid-tier models, such as GPT-5 mini.

But from my understanding, their implementation is also similar to the stateless functions I described. (happy to be proven wrong). Sub agents don't communicate back much aside from the final result, and they don't have a conversation history.

The live updates you see are mostly the application layer updating the UI which initially confused me.

ctxc · 2025-08-16T15:06:28 1755356788

Hey, I'd love to share/discuss cool stuff you do. I tinker with stuff now and then - my profile has my contacts, please give me a poke!

AndyNemmity · 2025-08-16T06:00:18 1755324018

Love how you experimented, you are a creative thinker.

imsh4yy · 2025-08-16T06:14:30 1755324870

Haha, thank you! I just like to build stuff.

adastra22 · 2025-08-16T06:01:03 1755324063

I am doing similar experimentation with Claude Code. I believe you are correct. The primary agent only sees the generated report, nothing more.

imsh4yy · 2025-08-16T05:47:05 1755323225

So far I've been hardcoding these into the API calls.

jasonriddle · 2025-08-16T05:51:08 1755323468

Sure, but to clarify, so you are probably setting temperature to close to 0 in order to try to get as consistent output as possible based on the input? Have you made any changes to top k and/or top p that you have found makes agents output more consistent/deterministic?

imsh4yy · 2025-08-16T05:53:46 1755323626

Yes, temp is close to 0 for most models. For top k and top p, I've been using the default values set in OpenRouter.

imsh4yy · 2025-08-16T04:47:46 1755319666

Author of this post here.

For context, I'm a solo developer building UserJot. I've been recently looking deeper into integrating AI into the product but I've been wanting to go a lot deeper than just wrapping a single API call and calling it a day.

So this blog post is mostly my experience trying to reverse engineer other AI agents and experimenting with different approaches for a bit.

Happy to answer any questions.

itsalotoffun · 2025-08-16T05:35:19 1755322519

When you discuss caching, are you talking about caching the LLM response on your side (what I presume) or actual prompt caching (using the provider cache[0])? Curious why you'd invalidate static content?

[0]: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...

imsh4yy · 2025-08-16T05:39:00 1755322740

I think I need to make this a bit more clear. I was mostly referring to caching the tools (sub-agents) if they are a pure function. But that may be a bit too speicific for the sake of this post.

i.e. you have a query that reads data that doesn't change often, so you can cache the result.

adastra22 · 2025-08-16T05:58:23 1755323903

It seems very doubtful to me that every query would be literally the same (e.g. same hash), if these are plain text descriptions of the subset task.

cma · 2025-08-16T11:04:26 1755342266

The task can be something like summarize each source file. Many files might not change every time.

imsh4yy · 2025-08-16T06:04:24 1755324264

I mean that depends on how you define the "input" for the tool. Some can be very deterministic like an enum, boolean, number, etc.

solidasparagus · 2025-08-16T05:09:06 1755320946

Nice post! Can you share a bit more about what variety of tasks you've used agents for? Agents can mean so many different things depending on who you're talking to. A lot of the examples seem like read-only/analysis tasks. Did you also work on tasks where agent took actions and changed state? If yes, did you find any differences in the patterns that worked for those agents?

imsh4yy · 2025-08-16T05:15:42 1755321342

Sure! So there are both read-only and write-only agents that I'm working on. Basically there's a main agent (main LLM) that is responsible for the overall flow (currently testing GPT-5 Mini for this) and then there are the sub-agents, like I mentioned, that are defined as tools.

Hopefully this isn't against the terms here, but I posted a screenshot here of how I'm trying to build this into the changelog editor to allow users to basically go:

https://x.com/ImSh4yy/status/1951012330487079342

1. What tickets did we recently close? 2. Nice, write a changelog entry for that. 3. Add me as author, tags, and title. 4. Schedule this changelog for monday morning.

Of course, this sounds very trivial on the surface, but it starts to get more complex when you think about how to do find and replace in the text, how to fetch tickets and analyze them, how to write the changelog entry, etc.

Hope this helps.

AndyNemmity · 2025-08-16T06:03:59 1755324239

Neat idea!

imsh4yy · 2025-08-16T06:15:31 1755324931

Thank you :)

itsalotoffun · 2025-08-16T05:38:11 1755322691

Also, regarding your agents (primary and sub):

- Did you build your own or are you farming out to say Opencode? - If you built your own, did you roll from scratch or use a framework? Any comments either way on this? - How "agentic" (or constrained as the case may be) are your agents in terms of the tools you've provided them?

imsh4yy · 2025-08-16T05:45:51 1755323151

Not sure if I understand the question, but I'll do my best to answer.

I guess Agents/Agentic are too broad of a term. All of this is really an LLM that has a set of tools that may or may not be other LLMs. You don't really need a framework as long as you can make HTTP calls to openrouter or some other provider and handle tool calling.

I'm using the AI sdk as it plays very nicely with TypeScript and gives you a lot of interesting features like handling server-side/client-side tool calling and synchronization.

My current setup has a mix of tools, some of which are pure functions (i.e. database queries), some of which handle server-side mutations (i.e. scheduling a changelog), and some of which are supposed to run locally on the client (i.e. updating TipTap editor).

Again, hopefully this somewhat answers the question, but happy to provide more details if needed.

pamelafox · 2025-08-16T05:13:11 1755321191

When you describe subagents, are those single-tool agents, or are they multi-tool agents with their own ability to reflect and iterate? (i.e. how many actual LLM calls does a subagent make?)

imsh4yy · 2025-08-16T05:17:40 1755321460

So I have a main agent that is responsible for streering the overall flow, and then there are the sub-agents that, as I mentioned, are stateless functions that are called by the main agent.

Now these could be anything really: API calls, pure computation, or even LLM calls.

energy123 · 2025-08-16T05:54:17 1755323657

How tightly scaffolded/harnessed/constrained is your primary agent for a given task? Are you telling it what reasoning strategy to use?

sgt101 · 2025-08-16T07:12:42 1755328362

Could you produce some evidence that what you are advising is actually useful?

imsh4yy · on Jan 2, 2025

Gents, I'm the CEO of getfullyear.com

Happy to answer any questions regarding our service.

touristtam · on Jan 13, 2025

- Why no Golang/PHP/Java client?

- How do I contribute?

- Why javascript and not typescript? You are missing bun and deno, what's up with that?

- Is your intern ok?

- I don't have a need for enterprise solution but still would like paid-support, are you planning a 3rd tier in-between the free tier and the enterprise one?

dark__paladin · on Jan 3, 2025

What year is it?

imsh4yy · on June 20, 2023

Having an event-driven infrastructure is a big missing piece in the serverless world and I'm glad to see you guys stepping in and filling this gap!