As long as the total number of pixels is less, I don't see what that has to be true, at least bandwidth wise? Compute wise, the vision might have to do slightly more to separate the buffer and composite them into the AR view in different places, but the bandwidth should be direction proportional to the number/size of each window. If I can fit all the windows on a 4K screen, then I don't see why the software can't split that and lay it out separate in my view instead of in a single rectangle.
Some of the windows will be obscured by others. If you stream a 100 windows to visionOS, it's possible to lay them out so that none of them cover each other and you have to render them all. On a flat screen there is a limit to how many pixels you need to paint.