Do you think you will keep it free or can you see a business model developing around it? If so, what do you think it would be? / How would you split paid tiers vs free users? Not a big deal to me...!! But I'm curious how one might commercialise these types of free/open source projects
I could see there being a long term free offering that doesn't cost us compute or tokens, and probably some other offerings that actually do use resources and would make sense to build a business around.
But that's not a today problem, we just want to absorb feedback and iterate until we build the ultimate tool for working with these coding agents.
Is this basically a LLM that has tools automatically configured so I don’t have to handle that myself? Or am I not understanding it correctly? As in do I just make standard requests , but the LLM does more work than normal before sending me a response? Or I get the response to every step?
The aspirational goal is that the model knows what tools to call and when, without human intervention. In practice, you'll see varying efficacy with that depending on the tools you need. Some of the tool usage is in-distribution / well represented in training set, but if you have some custom exotic MCP server you created yourself (or pulled off of some random github) you may see mixed results. Sometimes that can be fixed by simply augmenting your prompt with contrastive examples of how to use or not use the tool.
As an aside, my experience with devstral (both via API and locally w/ open weights) has been very underwhelming to this effect. So I'm curious how this new agent infra performs given that observation.
It's a software framework for orchestrating agents. Each agent can have its own system prompt, its own tools, and it can delegate ("hand off") to a different agent. When a hand off occurs, the LLM runs again but as a different agent.
Funny times. Sonnet 3.7 launches and there is big hype... but complaints start to surface on r/cursor that it is doing too much, is too confident, has no personality. I wonder if 4.5 will be the reverse, an under-hyped launch, but a dawning realisation that it is incredibly useful. Time will tell!
I share the sentiment, as far as I've used it, Sonnet 3.7 is a downgrade and I use Sonnet 3.5 instead. 3.7 tends to overlook critical parts of the query and confidently answers with irrelevant garbage. I'm not sure how QA is done on LLM-s, but I for one definitely feel like the ball was dropped somewhere.
I created this for fun after seeing (and being totally impressed by) https://poke-holo.simey.me/. There is a lot of similarity obviously!
My aim was to replace my doom scrolling with something a bit prettier.
Right now I'm just putting it out there to see if anyone thinks it's cool. I'd love to know what you think (and if I should keep working on it). Thanks :)