Keep in mind that this is not the only way the experts program this technology.
There's plenty of fine-tuning and RLHF involved too, that's mostly how "model alignment" works for example.
The system prompt exists merely as an extra precaution to reinforce the behaviors learned in RLHF, to explain some subtleties that would be otherwise hard to learn, and to fix little mistakes that remain after fine-tuning.
You can verify that this is true by using the model through the API, where you can set a custom system prompt. Even if your prompt is very short, most behaviors still remain pretty similar.
There's an interesting X thread from the researchers at Anthropic on why their prompt is the way it is at [1][2].
Supposedly they use "RLAIF", but honestly given that the first step is to "generate responses... using a helpful-only AI assistant" it kinda sounds like RLHF with more steps.
There's plenty of fine-tuning and RLHF involved too, that's mostly how "model alignment" works for example.
The system prompt exists merely as an extra precaution to reinforce the behaviors learned in RLHF, to explain some subtleties that would be otherwise hard to learn, and to fix little mistakes that remain after fine-tuning.
You can verify that this is true by using the model through the API, where you can set a custom system prompt. Even if your prompt is very short, most behaviors still remain pretty similar.
There's an interesting X thread from the researchers at Anthropic on why their prompt is the way it is at [1][2].
[1] https://twitter.com/AmandaAskell/status/1765207842993434880?...
[2] and for those without an X account, https://nitter.poast.org/AmandaAskell/status/176520784299343...