Looking at that ruling, it seems the case you linked to hinged on a fact not applicable with the Stack:
>Field had actual knowledge of the Googlebot. He also was aware of the ways to prevent Google from either listing his site at all or listing it but not providing a link to the cached version. Instead of opting out, however, he chose to allow Google to both index and provide a link to the cached version.
For the AI dataset, (A) did the person know their work was being collected by this group and for this purpose, and (B) did they know of a way to prevent that collection?
>Field had actual knowledge of the Googlebot. He also was aware of the ways to prevent Google from either listing his site at all or listing it but not providing a link to the cached version. Instead of opting out, however, he chose to allow Google to both index and provide a link to the cached version.
For the AI dataset, (A) did the person know their work was being collected by this group and for this purpose, and (B) did they know of a way to prevent that collection?