I have one opening in a new project around AI for precision medicine.
One thrust is about combining biomedical knowledge graphs and large language models into a biomedical-fluent core platform.
Another thrust is about UI for data-rich programmable systems.
The opening is for either a research engineer or a postdoc, with different criteria in each case.
The research engineer would be responsible for quarterly demo-driven deliverables.
The postdoc would be expected to develop a research agenda with these applications in mind.
This seems cool! Is there a way to try it locally with an open LLM? If you provide a way to set the OpenAI server URL and other parameters, that would be enough. Is the API_URL server documented, so a mock a local one can be created?
Definitely, it is something we are super focused on as it seems to be a use case that is important for folks. Opening up the proxy server and adding local LLM support is my main focus for today and will hopefully update on this comment when it is done :)
Ollama support would be amazing. There's a stack of people in organizations (data rich places) who would likely love something like this, but who cannot get to OpenAI due to organizational policies.
Yes. The verifier check already ensures syntactic correctness, but the search could goes faster if the underlying LLM doesn't generate bad syntax to begin with.
I mean... isn't the task that you're optimizing 'has a function and a set of lemma that all pass'.
How does the MCTS distinguish between 'generated a stupid lemma that is true' and 'generated a valid lemma'?
Is there any reason to expect that a 'good' partial subtree will result in a 'good' output?
Why do you think this approach will generate valid lemmas?
(Not structurally valid; valid as in, they assert a solution to the given problem).
It seeeems a lot like going like this:
"Generate me a function for X and a bunch of tests that verify X".
"If the tests pass, the function is good"
...but, there's no specific reason to expect that to be true?
How is what you're doing different?
Clearly (from the examples) in a trivial case, it is true, but generally speaking as the task complexity increases, this type of 'auto validation' seems to struggle...?
Using grammars to generate a structured output seems like a similar (and successful) approach, used by many people, but because it doesn't have the associated auto-validation, it's robust against LLM randomness.
I guess I'm struggling to see how this is superior / novel to existing approaches in that space.
> How does the MCTS distinguish between 'generated a stupid lemma that is true' and 'generated a valid lemma'?
It does not, though I would like to add validation as part of the check as future work.
> Is there any reason to expect that a 'good' partial subtree will result in a 'good' output?
Yes. I am using Phind Code Llama and it generates a lot of invalid Dafny out of the box. Now, all these false starts get pruned.
I do suspect that if we try to scale this to Coq proof, we will run in local optima rabbit holes.
> Why do you think this approach will generate valid lemmas?
> (Not structurally valid; valid as in, they assert a solution to the given problem).
Because the LLM tries to follow instructions.
> "Generate me a function for X and a bunch of tests that verify X".
> "If the tests pass, the function is good"
> ...but, there's no specific reason to expect that to be true?
Indeed, there are complementary approaches like Speculyzer that help validate specs and code at once.
> How is what you're doing different?
That's not the problem I am solving at the moment. What I solve is that I am able to generate verified code (that still need to be inspected) while the LLM struggles to do in one go.
> I guess I'm struggling to see how this is superior / novel to existing approaches in that space.
It's not that novel, see the planning in LLM work that I credit in the README.md. This transfers from a setting of checks by tests to a setting of checks by a verifier.
The code is very simple, and it actually brings open models to reliably solve problems that even ChatGPT only sometimes solve.
I think it's a good foundation to build more sophisticated verification with LLMs.
I have one opening in a new project around AI for precision medicine. One thrust is about combining biomedical knowledge graphs and large language models into a biomedical-fluent core platform. Another thrust is about UI for data-rich programmable systems. The opening is for either a research engineer or a postdoc, with different criteria in each case. The research engineer would be responsible for quarterly demo-driven deliverables. The postdoc would be expected to develop a research agenda with these applications in mind.
Contact: namin@seas.harvard.edu