The unfortunate answer is that the courts have yet to decide what exactly "training an AI" means in the context of copyright.
I personally do believe that training using copyrighted data is _not_ a violation of any existing copyright, but using the trained AI to reproduce and distribute something that an expert in the field would regard as a copy is a violation.
I personally do believe that training using copyrighted data is _not_ a violation of any existing copyright, but using the trained AI to reproduce and distribute something that an expert in the field would regard as a copy is a violation.