RL is still training. Just like pretraining is still training. SFT is also training. This is how I look at it. Models weights are being updated in all cases.
Simplifying it down to "adjusting any weights is training, ipso facto this is meaningful" obscures more light than it sheds (as they noted, RL doesn't get you very far, at all)