I always thought that fine tuning is more like getting a style rather than memor...

lewq · on Feb 8, 2024

Fine tuning is just more training -- so it's definitely possible to teach the model facts this way too.

In practice we've found that it's a bit of a balancing act to teach the model the new knowledge without destroying existing knowledge, but it's just a matter of tuning the parameters carefully. We're also researching whether we can fine-tune a brand new expert in a MoE model like Mixtral, I've also seen work on fine-tuning just a fixed set of weights. I'm sure there will be more developments in this space soon.

In terms of how you refer to new knowledge and not base knowledge, like many things in LLMs, you just ask the LLM :-) For example, if you look at this session https://app.tryhelix.ai/session/62905598-b1b7-4d93-bc39-5a93... and click "Show Info" at the top, you can see the system prompt is:

"You are an intelligent chatbot named Helix that has been fine-tuned on document(s) e1ef2e896c in document group 62905598b1. The document group contains 1 document(s). The user will ask you questions about these documents: you must ONLY answer with context from the documents listed. Do NOT refer to background knowledge."

It does a pretty good job at this, although I'm sure there are ways to improve it further.

Referencing the specific document IDs in the fine-tuning was an innovation that has really helped us.

In terms of training time, yeah - 5 minutes on a news article, 10 minutes on a typical length paper. Pretty usable. We're experimenting with reducing the number of epochs and increasing the learning rate to make it faster at that too.

gbickford · on Feb 8, 2024

Have you tried generating two sets of qapairs, one with bad answers, and using DPO?

lewq · on Feb 8, 2024

Not yet, sounds promising!

aCoreyJ · on Feb 8, 2024

What is the advantage over using Retrieval Augmented Generation ?

mendeza · on Feb 8, 2024

RAG adds context to the users question to reduce hallucination. https://docs.llamaindex.ai/en/stable/getting_started/concept...

aCoreyJ · on Feb 8, 2024

Actually missed this was covered in the post, thanks

aCoreyJ · on Feb 8, 2024

Actually missed this is answered in the article!

drphilwinder · on Feb 8, 2024

Your sentiment is correct, but it's more of a spectrum. Fine tuning can learn facts (otherwise how would the foundation models learn facts?). But it needs those facts in the training dataset. If you have an infinite amount of facts, then you can memorise all of them.

The challenge arises when it becomes hard to generate that training data. If you just have the raw text and pop that in the context (i.e. RAG), then the LLM can be just as factual without any of that hassle.

Q2: identifiers in the prompt to say "you've been trained on this, only answer questions about this".

Q3: Depends on the size of the training data/docs. For the average PDF, about 30 minutes.

Give it a try!

gpderetta · on Feb 8, 2024

> If you have an infinite amount of facts, then you can memorise all of them

pigeon-hole?

gdiamos · on Feb 8, 2024

Not literally infinite, but Llama2 scale models can handle about 10 trillion tokens.