Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does this mean that I need to call the LLM API once for each token?


No. You need to hook into the LLM at a lower level. One API call typically triggers a generation of a sequence of tokens and this library has to poke into things between each generated token.


Can't I use the max_tokens (set to 1) and logit_bias parameters? Not saying I want to do this. I just want to understand how this works.


Not sure exactly what is logit_bias, but after Googling for 5 seconds it seems to be an OpenAI parameter that's not available in HuggingFace transformers?

Anyway, if your idea is to make one API call per token, the biggest problem with that approach is that it would be really slow to do that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: