When evaluating this work, it’s important to remember that the functional labels...

_kava · on May 13, 2023

I consider all the manual curation effectively a form of RLHF that can be imposed automatically later on. We saw how much this can improve a raw LLM by looking at the output of ChatGPT. Otherwise, the criticism of LLMs being just glorified autocomplete machines isn't that far from reality. In other words, it is just an expected requirement for LLMs to be effective.

You are probably right that lysozyme is an easy target and may have large sequence variety between homologs so saying "very different" for 30-40% is not correct. But that is only in the context of biology and protein structure and function. This is an LLM trained on primary sequences only. It doesn't know anything about the folds or domains or functional sites (unless I am wrong and those are part of the metadata fed to it during training). Yet it did learn enough to generalize to the point that even with only 30-40% identity, it still produces soluble proteins with the same function. I am sure you know that at 40% differences, one protein can be in an entirely different superfamily from another. So it is still an impressively low identity score.

Also, I think it is more appropriate to compare the amino acids to things like the alphabets than vocabs. Domains would probably be an equivalent to LLaMa vocab.

throwawaymaths · on May 14, 2023

No because an rlhf step is kind of independent, manually curated is really hard to fully disentangle from the original prediction.

There are a lot of named proteins that have names which are "legacy", sometimes assigned by homology that probably misses important ways that biology uses the protein that were discovered later.

_kava · on May 14, 2023

Perhaps fine-tuning is a better word? I am unsure what is the process that let an LLM switch from just a next word prediction tool to a chatbox. Instruction tuning?

The author basically chose some of the output based on set criteria. I think this can eventually be automated and embedded into the protein language model the same way ChatGPT now has guardrails and specific ways to answer questions, instead of following up with the most likely sentence, e.g.: asking it what is the capital of france get an output of another question about what is the capital of germany.

og_kalu · on May 14, 2023

Instructing finetuning or RLHF. Both instances are "just" next-word predictors. Instruction tuning just changes the goals of the predictions. Doesn't necessarily make a model "smarter"(didn't for GPT-4) but it does make it for accessible.