No because an rlhf step is kind of independent, manually curated is really hard ...

_kava · on May 14, 2023

Perhaps fine-tuning is a better word? I am unsure what is the process that let an LLM switch from just a next word prediction tool to a chatbox. Instruction tuning?

The author basically chose some of the output based on set criteria. I think this can eventually be automated and embedded into the protein language model the same way ChatGPT now has guardrails and specific ways to answer questions, instead of following up with the most likely sentence, e.g.: asking it what is the capital of france get an output of another question about what is the capital of germany.

og_kalu · on May 14, 2023

Instructing finetuning or RLHF. Both instances are "just" next-word predictors. Instruction tuning just changes the goals of the predictions. Doesn't necessarily make a model "smarter"(didn't for GPT-4) but it does make it for accessible.