But you can "see and watch" things in a text-based tutorial. Short video clips and images are a key part of it. Like I've said elsewhere - it's the "person talking" bit and the linear format that I struggle with. I can skim text+images whereas video+speech forces me to consume it in large chunks at a predetermined speed (1.5x not withstanding. That's only a minor improvement)
That's why we have different mediums for different people.
For example I can never learn visual things by listening. I learn much better by seeing or watching things.