Does this mean that I need to call the LLM API once for each token?

baobabKoodaa · on Aug 14, 2023

No. You need to hook into the LLM at a lower level. One API call typically triggers a generation of a sequence of tokens and this library has to poke into things between each generated token.

Kiro · on Aug 15, 2023

Can't I use the max_tokens (set to 1) and logit_bias parameters? Not saying I want to do this. I just want to understand how this works.

baobabKoodaa · on Aug 16, 2023

Not sure exactly what is logit_bias, but after Googling for 5 seconds it seems to be an OpenAI parameter that's not available in HuggingFace transformers?

Anyway, if your idea is to make one API call per token, the biggest problem with that approach is that it would be really slow to do that.