It's bad when they indiscriminately crawl for training, and not ideal (but understandable) to use the Internet to communicate with them (and having online accounts associated with that etc.) rather than running them locally.
It's not bad when they use the Internet at generation time to verify the output.
I don't know for certain what you're referring to, but the "bulk downloads" of the Internet that AI companies are executing for training are the problem I've seen cited, and doesn't relate to LLMs checking their sources at query time.