Yeah, no. We can move an arbitrary amount of data around the world at breakneck speed. Netflix does this for a living. It's not practical to hand out the training material because of the massive rampant copyright violations.
If a research group downloads material in order to train a model, is there some significant difference in copyright violation if they hand it to a second research group in order to fulfill the same purposes?
Yes, because of a key word in a lot of copyright laws... "distribution". Using that copyrighted material themselves to train the model still gives them plausible deniability. Handing the copyrighted material to another group starts to run afoul of other laws and also removes the plausible deniability that the original group can claim regarding their training data.