No because an rlhf step is kind of independent, manually curated is really hard to fully disentangle from the original prediction.
There are a lot of named proteins that have names which are "legacy", sometimes assigned by homology that probably misses important ways that biology uses the protein that were discovered later.
Perhaps fine-tuning is a better word? I am unsure what is the process that let an LLM switch from just a next word prediction tool to a chatbox. Instruction tuning?
The author basically chose some of the output based on set criteria. I think this can eventually be automated and embedded into the protein language model the same way ChatGPT now has guardrails and specific ways to answer questions, instead of following up with the most likely sentence, e.g.: asking it what is the capital of france get an output of another question about what is the capital of germany.
Instructing finetuning or RLHF. Both instances are "just" next-word predictors. Instruction tuning just changes the goals of the predictions. Doesn't necessarily make a model "smarter"(didn't for GPT-4) but it does make it for accessible.
There are a lot of named proteins that have names which are "legacy", sometimes assigned by homology that probably misses important ways that biology uses the protein that were discovered later.