You should check out ChatDBG project - which AFAICT goes much further than this work, though in a different direction, and which, among other things, lets the LLM drive the debugging process - has been out since early 2023. We initially did a WinDBG integration but have since focused on lldb/gdb and pdb (the Python debugger), especially for Python notebooks. In particular, for native code, it integrates a language server to let the LLM easily find declarations and references to variables, for example. We spent considerable time developing an API that enabled the LLM to make the best use of the debugger’s capabilities. (It also is not limited to post mortem debugging). ChatDBG’s been out since 2023, though it has of course evolved since that time. Code is here [1] with some videos; it’s been downloaded north of 80K times to date. Our technical paper [2] will be presented at FSE (top software engineering conference) in June. Our evaluation shows that ChatDBG is on its own able to resolve many issues, and that with some slight nudging from humans, it is even more effective.
Is the benefit of using a language server as opposed to just giving access to the codebase simply a reduction in the amount of tokens used? Or are there other benefits?
Beyond saving tokens, this greatly improved the quality and speed of answers: the language server (most notably used to find the declaration/definition of an identifier) gives the LLM
1. a shorter path to relevant information by querying for specific variables or functions rather than longer investigation of source code. LLMs are typically trained/instructed to keep their answers within a range of tokens, so keeping shorter conversations when possible extends the search space the LLM will be "willing" to explore before outputting a final answer.
2. a good starting point in some cases by immediately inspecting suspicious variables or function calls. In my experience this happens a lot in our Python implementation, where the first function calls are typically `info` calls to gather background on the variables and functions in frame.
Yes. It lets the LLM immediately obtain precise information rather than having to reason across the entire source code of the code base (which ChatDBG also enables). For example (from the paper, Section 4.6):
The second command, `definition`, prints the location and source
code for the definition corresponding to the first occurrence of a symbol on a
given line of code. For example, `definition polymorph.c:118` target prints the
location and source for the declaration of target corresponding to its use on
that line. The definition implementation
leverages the `clangd` language server, which supports source code queries via
JSON-RPC and Microsoft’s Language Server Protocol.
This is in the works :) In the past and still today, OpenAI has much better and easier function call support, something we rely on.
We currently are running through LiteLLM, so while undocumented in theory other LLMs could work (in my experience they don't). I’m working on updating and fixing this.
[1] https://github.com/plasma-umass/ChatDBG (north of 75K downloads to date) [2] https://arxiv.org/abs/2403.16354