Thanks! We definitely experimented with V only (that's the dream), but there's too much context missing:
1. What's behind a select option? You don't know until you click it, which means you need another iteration. This sucks.
2. How do you consistently correlate things in the images to actual actions (ie upload a file to a file input, click on a button, insert a date into a date)? Having the additional HTML Tag information dramatically improves the action selection process (click vs upload vs type)
interesting concept for problem solving though. congrats!