Seeing the $ every time I do something, even if it's $0.50, can be a little stressful. We should have an option to hide it per-request and just show a progress bar for the current topup.
The obvious thing would be LSP interrogation, which would allow the token context to be significantly smaller than entire files. If you have one file open, and you are working on a function that calls out to N other modules, instead of packing the context with N files, you get ONLY the sections of those files the LSP tells you to look at.
One thing that I think would be cool, and that could perhaps be good starting point, is a TDD agent. How I imagine this working:
User (who is a developer) writes tests, and a description of the desired application. The agent attempts to build the application, compiles the code, runs the tests, and automatically feeds any compiler errors and test failures back to agent so that it can fix it's own mistakes without input of the user.
Based on my experience of current programming agents, I imagine it'll take the agent a couple of attempts to get an application that compiles and passes all the tests. What would be really great to see is an agent (with a companion application probably) that automates all those retries in a good way.
i imagine the hardest parts will be to interpret compiler output, and (this is where things get real tricky) test output, and how to translate that into code changes in the existing code base.
Yeah, this is a great workflow! What's more, agents are particularly good at writing tests, since they're simpler and mostly linear, so they can even help with that part.
As to your point of automating retries, with my last prototype I played a lot with having agents do multiple parallel implementations, and then pick the first one that works, or lets you choose (or even have another agent choose).
Have you tried any tools that have this workflow down, or at least approach it?
I have not! But I've often been frustrated when an agent gives me code that doesn't compile, and I keep thinking that would be a solvable problem. One computer program should be able to talk to the other
This is going to sound a bit odd, but I suggest you detail what your tools do well and what they struggle with. For example I love Haxe, which is a niche programming language primarily for game development.
The vast majority of the time I try to use an llm with it, the code is essentially useless as it will try to invent methods that don't even exist.
For example if you're coding agents are really only good at JavaScript and a little bit of python, tell me that front and center.
Good point! In that sense we're similar to most AI coding agents in that the languages we do well are the languages the mainstream LLMs do well. We might zoom in and add really good support for particular languages though (not decided yet), in which case we'll def mention that front and center!
Have you found any LLMs or coding agents that work well with Haxe? It might be a bit too niche for us (again, not sure yet), but I'd be very curious to see what they do well!
This works well, however it literally will need to digest an entire repository. So for example if I feed it a repository for a haxe framework, it'll work much better than something like Chat GPT.
In my unqualified opinion, LLMs would do better at niche languages or even specific versions of mainstream languages, as well as niche frameworks, if they were better at consultig the documentation for the language or framework, for example, the user could give the LLM a link to the docs or an offline copy, and the LLM would prioritise the docs over the pretrained code. Currently this is not feasible because 1. limited context is shared with the actual code, 2. RAG is one-way injection i to the LLM, the LLM usually wouldn't "ask for a specific docs page" even if they probably should.
100% agreed on both points. Point 1 relates to https://news.ycombinator.com/item?id=43486526 as well. It's one of the biggest challenges, though maybe it'll automatically get better through models with bigger context windows (we can't assume that though)?
If I'm just exploring ideas for fun or scratching my own itch, I have no desire to be thinking about a continuous stream of expenditure happening in the background when I have an apple silicon mac with 64GB of ram fully capable of running an agentic stack with tool calling etc.
Please make it trivial to setup and use a llamafile or similar as the LLM for this.
In roughly the last 2 weeks, yes. It helped that everyone involved also activated their network, so we got a multiplicative effect. Can't speak to funding for now unfortunately.
Our backers have no interest in fake metrics. ;) It's a good way to quickly get feedback, which is key to our strategy. Totally fine to keep using Roo Code (or Cline) of course!
I think that this is kind of an obvious "optimization" for making application generation much more reliable. Just because the models can generate code for one of 1000 different platforms, doesn't mean that you need all of them. Just by narrowing the scope to a particular platform makes it much more feasible to get working applications without needing manual debugging due to out of date library references etc.
I think something like the approach you have demonstrated here will relatively quickly become the standard for "no-code" application development.
Completely agree. It's useful not just for targeting one specific language, but all the other APIs that we have, and things like RAG to search for importable modules on the platform. Duplicating all that across many platforms is a lot of work!
The economist in me is says "just show the prices", though the psychologist in me says "that's hella stressful". ;)
reply