My understanding is that they trained it to explicitly use a self-prune/self-edit tool that trims/summarizes portions of its message history (e.g. use tool results from file explorations, messages that are no longer relevant, etc) during the session, rather than "panic-compact" at the end. In any case, it would be good if it does something like this.
Photo-realism is great but the real step-jump in image-gen I’m looking for is the ability to draw high quality technical diagrams with a mix of text and images, so I can stop having LLMs generate crappy diagrams with mermaid, SVG, HTML/CSS, draw.io
These aren’t really indicative of real world performance. Retrieving a single fact is pretty much the simplest possible task for a long context model. Real world use cases require considering many facts at the same time while ignoring others, all the while avoiding the overall performance degradation that current models seem susceptible to when the context is sufficiently full.
I built a similar tool called “lmsh” (LM shell) that uses Claude-code non-interactive mode (hence no API keys needed, since it uses your CC subscription): it presents the shell command on a REPL like line that you can edit first and hit enter to run it. Used Rust to make it a bit snappier:
It’s pretty basic, and could be improved a lot. E.g make it use Haiku or codex-CLI with low thinking etc. Another thing is have it bypass reading CLAUDE.md or AGENTS.md. (PRs anyone? ;)
>it presents the shell command on a REPL like line that you can edit first and hit enter to run it.
Oh genius, that's the best UX idea for the situation of asking an LLM to flesh out the CLI command without relying entirely on blind faith.
Even better if we can have that kind of behavior in the shell itself. For example if we started typing "cat list | grep foo | " and then suddenly realized we want help with the awk command so that it drops the first column.
This a pretty neat approach, indeed. Having to use the API might be an inconvenience for some people indeed. I guess having the Claude or ChatGPT subscription and using it with the CLI tools is what makes developers stick with these tools, instead of using what is out there.
Right, when we’re already paying $100 or $200 per month, leveraging that “almost-all-you-can eat buffet” is always going to be more attractive than spending more on per token API billing.
1. Roadside Picnic, by the Strugatsky brothers, loose basis of Tarkovsky's Stalker movie.
2. XX by Rian Hughes -- hugely under-rated book. Starts with a signal from outer space and goes quite far, and also has a book-within-a-book. Nearly 1000 pages but found it very engaging.
Currently trying to read Stanlislaw Lem's His Master's Voice which has a similar theme of a possible signal from an alien intelligence.
I like each at different times in different ways. Now I have both running in separate Tmux panes and have one talk to the other to ask/delegate/verify/validate, using my Tmux-cli tool (now a Claude skill of course):
Now my work on a project often spans multiple sessions of these agents. So I use a session-finder and resume/dump tool (also in that repo). I often ask Claude or codex to extract all useful details from a jsonl session log file so I can continue the work.
I assume by “it” you mean Claude code or codex-cli — that depends on how you launched them or how you modified the permissions within the CLI chat; that’s orthogonal to my CLI tools.
reply