This is a rough take. Not every study has to be fully ecologically valid to show us something interesting about human behavior. The authors have presented you with a real, observed difference. What is your interpretation of that difference? Why do you think there is a difference? These people were ostensibly not told that the study was explicitly investigating gender differences, so why is there one?
The male BYU students in the study are performing masculinity. They're claiming to look straight ahead, turning neither to the right nor to the left, because they think that's what Real Men do.
(This is a dumb hypothesis. But there's nothing in the study that allows us to refute it. I'm just saying the study could be improved).
That's certainly a potential follow up hypothesis. Surely this study can be improved, but conversely not every study has to answer every question or be perfectly ecologically valid before it is worthy of publication. For all its faults, the scope of these results is clear, and worth discussing. The authors have a hypothesis - women visually assess scenes differently to men when in the context of walking through them - and have chosen an operationalization that, while not literally strapping an eye tracker to someone while they walk through campus, is a reasonable proxy for it within the time, budget, and technology constraints of the research group. This effort has produced a clear result. Future studies can now be developed that improve the design methodologically and ecologically and dig into alternatives and further nuances.
I know from UX studies that eye-tracking gives a completely different heat map than click-tracking on the same web page and task (because one's measuring how you scan a page and the other's measuring what you click on after scanning the page). This is previous experience that makes me super wary of click-tracking being used to measure scanning behaviour. To me it's akin to saying "we measured the color of the solution with a thermometer", it's just the wrong tool for the job. That's why I don't think it's a reasonable proxy.
I think "future studies can now be developed" is exactly what I'm hoping for.
Video would seem to be an improvement over still images.
If we're fleshing it out, I think a wider array of college students (BYU may have its own issues) and adding some "neutral" spaces to the test might yield more information. (Although if being "wary" is habitual, the behaviour would probably still show up in neutral spaces?)
You know that thing where >95% of people claim to be above-average drivers? I'd like to rule that out.
Once again, because I think my comment will be misconstrued if I don't attach a disclaimer: I believe the result, but I don't believe the study.