The distinction (from how I'm reading the GP's comment) is that Group 4 presupposes that data collection is necessary, and seeks to minimize unethical means of collecting that data, while Group 5 presupposes that ethics are paramount, and seeks to establish whether or not data collection can actually be ethical at all.
That is: Group 4 would be more willing to compromise ethics if absolutely necessary to get the data said group needs, while Group 5 would be more willing to compromise data collection if there's no ethical way to collect that data.
I agree. The way way the categories are worded essentially excludes the possibility that maybe we shouldn’t be training these models at all. We have banned potentially insightful experiments in both medicine and psychology because they are unethical. I see no reason ML should get a pass.
That is: Group 4 would be more willing to compromise ethics if absolutely necessary to get the data said group needs, while Group 5 would be more willing to compromise data collection if there's no ethical way to collect that data.