I find it somewhat strange that video games are OK (it's on a screen but interactive) and I guess live theater is OK (it's passive but not on a screen), but the combination of passively watching something on a screen is not OK.
Video games allow the user to interact with the media and cause reactions to input. It allows us to explore causality at a very personal level, even though the systems and entities invovled are entirely virtual.
Live theater allows the user to control what to focus on during each scene. If there is a conversation, we can choose who the "camera" is pointing at. We are also looking at real human beings interacting with each other. The experience is mildly interactive.
Passive screen-based media gives us neither set of choices.
How is looking at an image of live human beings "worse" than looking at live human beings (which is ultimately an image too)?
If the child is given a remote control, does that count as interactivity since he can choose what to look at while exploring causality?
By the way, I'm not arguing that TV is good for kids. I'm just trying to analyze the argument in favor of preventing infants from watching passive screen media.
You can't compare seeing human beings interacting in a "true 3D", real-life setting to the same thing happening on a screen. You don't get depth perception, and there are artifacts of recording in video and audio (even in modern HD shows) that make a real-life scene distinctly different. I don't know scientifically how that's important, I just know there's a difference. You know there's a difference if you ever see someone on TV and then see them in real life for the first time.
A remote doesn't explore causality inside the media, it's only exploring "when I press a certain button it will switch to a different show which is not of my choosing and which I can't predict". They might correllate pressing the same numbers with the same show at the same time of the day. The "camera angle" interactivity in live theater doesn't get into causality at all, just different ways you can look at or listen to a scene, but in videogames causality is rarely so random.
Think of it at a very basic, I-don't-know-what-TV-or-videogames-are level: I press the channel up button on the remote, the image suddenly changes to something entirely different. If I do this a few hours later, the former image and the latter image are entirely different from before. Let's even assume I'm watching Netflix and I've figured out how to navigate menus: the menu is interactive, but the media I'm watching doesn't give me any control over what's happening inside the media. In a videogame, if I press a button a character will move, a gun will shoot, a menu will open. If I do the same thing a few hours from now, the same thing will happen. A different thing might happen in a predictable context: if my dude is in front of a wall he might climb the wall rather than jump when I press A, it's generally bad design to allow otherwise. The link between cause and effect is much more clear, and my role as an agent of cause is much more clear as well.