Isn't the main use of bittorrent for ML and research data? Academic torrents is a wonderful resource and what every developer should be using if they need to provide their neural network weights, training data, etc.
How is there any legal problem using bittorrent? It's simply much more tailored for this problem than http. It doesn't make any sense to talk about 'Legal problems' for torrent protocols.
What planet have you been living on? Bittorrent is widely used to distribute copyrighted material - movies, TV shows, games, programs, porn... I'd imagine a large majority of bittorrent traffic worldwide is pirated material, with a small portion being datasets as you describe, and other legally-shared data like actual Linux distros, etc.
I suppose there could be many things happening on the internet that we are unaware of; however, torrents are very good and specifically tailored as a protocol for scientific data and ML.
It solves the link-rot issues that occur due to moving institutions, it allows huge storage for essentially free (ever tried to store 9 TB of training data or CERN data on Dropbox?), and it scales extremely beautifully.
It's really the absolute perfect solution for reproducible research in large data studies.
Torrents are no longer main source of copyrighted materials, at least for shows and movies. There is a bunch of illegal services that provide Netflix like experience against pirated content.