After reading the docs for the new ChatGPT function calling yesterday, it's structured and/or typed data for GPT input or output that's the key feature of these new models. The ReAct flow of tool selection that it provides is secondary.
As this post notes, you don't even need to the full flow of passing a function result back to the model: getting structured data from ChatGPT in itself has a lot of fun and practical use cases. You could coax previous versions of ChatGPT to "output results as JSON" with a system prompt but in practice results are mixed, although even with this finetuned model the docs warn that there still could be parsing errors.
IIRC, there's a way to "force" LLMs to output proper JSON by adding some logic to the top token selection. I.e. in the randomness function (which OpenAI calls temperature) you'd never choose a next token that results in broken JSON. The only reason it wouldn't would be if the output exceeds the token limit. I wonder if OpenAI is doing something like this.
Note that you don’t necessarily need to have the AI output any JSON at all — simply have it answer when being asked for the value to a specific JSON key, and handle the JSON structure part in your hallucinations-free own code: https://github.com/manuelkiessling/php-ai-tool-bridge
Would be nice if you could send a back and forth interaction for each key. This approach turns into lots of requests that reapply the entire context and ends up slow. I wish i could just send a Microsoft guidance template program, and process that in a single pass.
For various reasons, token selection may be implemented as upweighting/downweighting instead of outright ban of invalid tokens. (Maybe it helps training?) Then the model could generate malformed JSON. I think it is premature to infer from "can generate malformed JSON" that OpenAI is not using token selection restriction.
> I assume OpenAI’s implementation works conceptually similar to jsonformer, where the token selection algorithm is changed from “choose the token with the highest logit” to “choose the token with the highest logit which is valid for the schema”.
But only for the whole generation. So if you want to constrain things one token at a time (as you would to force things to follow a grammar) you have to make fresh calls and only request one token which makes things more or less impractical if you want true guarantees. A few months ago I built this anyway to suss out how much more expensive it was [1]
I think the problem is that tokens are not characters. So even if you had access to a JSON parser state that could tell you whether or not a given character is valid as the next character, I am not sure how you would translate that into tokens to apply the logit biases appropriately. There would be a great deal of computation required at each step to scan the parser state and generate the list of prohibited or allowable tokens.
But if one could pull this off, it would be super cool. Similar to how Microsoft’s guidance module uses the logit_bias parameter to force the model to choose between a set of available options.
You simply sample tokens starting with the allowed characters and truncate if needed. It’s pretty efficient, there’s an implementation here: https://github.com/1rgs/jsonformer
It's not temperature, but sampling. Output of LLM is probabilistic distribution over tokens. To get concrete tokens, you sample from that distribution. Unfortunately, OpenAI API does not expose the distribution. You only get the sampled tokens.
As an example, on the link JSON schema is defined such that recipe ingredient unit is one of grams/ml/cups/pieces/teaspoons. LLM may output the distribution grams(30%), cups(30%), pounds(40%). Sampling the best token "pounds" would generate an invalid document. Instead, you can use the schema to filter tokens and sample from the filtered distribution, which is grams(50%), cups(50%).
Not traditional temperature, maybe the parent worded it somewhat obtusely. Anyway, to disambiguate...
I think it works something like this: You let something akin to a json parser run with the output sampler. First token must be either '{' or '['; then if you see [ has the highest probability, you select that. Ignore all other tokens, even those with high probability.
Second token must be ... and so on and so on.
Guarantee for non-broken (or at least parseable) json
What's the implication of this new change for Microsoft Guidance, LMQL, Langchain, etc.? It looks like much of their functionality (controlling model output) just became obsolete. Am I missing something?
If anything this removes a major roadblock for libraries/languages that want to employ LLM calls as a primitive, no? Although, I fear the vendor lock-in intensifies here, also given how restrictive and specific the Chat API.
Either way, as part of the LMQL team, I am actually pretty excited about this, also with respect to what we want to build going forward. This makes language model programming much easier.
`Although, I fear the vendor lock-in intensifies here, also given how restrictive and specific the Chat API.`
Eh, would be pretty easy to write a wrapper that takes a functions-like JSON Schema object and interpolates it into a traditional "You MUST return ONLY JSON in the following format:" prompt snippet.
It's only been added to the OpenAI interface. Function calling is really useful when used with agents. To include that to agents would require some redesign as the tool instructions should be removed from the prompt templates in favor of function definitions in the API request. The response parsing code would also be affected.
I just hope they won't come up with yet another agent type.
That example needs a bit of work I think. In Step 3, they're not really using the returned function_name; they're just assuming it's the only function that's been defined, which I guess is equivalent for this simple example with just one function but less instructive. In Step 4, I believe they should also have sent the function definition block again a second time since model calls in the API are memory-less and independent. They didn't, although the model appears to guess what's needed anyway in this case.
That SQL example is going to result in a catastrophe somewhere when someone uses it in their project. It is encouraging something very dangerous when allowed to run on untrusted inputs.
As this post notes, you don't even need to the full flow of passing a function result back to the model: getting structured data from ChatGPT in itself has a lot of fun and practical use cases. You could coax previous versions of ChatGPT to "output results as JSON" with a system prompt but in practice results are mixed, although even with this finetuned model the docs warn that there still could be parsing errors.
OpenAI's demo for function calling is not a Hello World, to put it mildly: https://github.com/openai/openai-cookbook/blob/main/examples...