> The eye tracking driven input method which was seen as holy grail turns out to be annoying after a while because people don't naturally always look at what they want to click on.
This has been know for at least 30 years in the eye tracking business and it even has a name - The Midas Touch problem.
"At first it is helpful to be able simply to look at what you want and have it occur without further action; soon, though, it becomes like the Midas Touch. Everywhere you look, another command is activated; you cannot look anywhere without issuing a command."
Yeah, Apple Vision doesn't have this problem, because eye tracking is used just for pointing; not for clicking on items.
Vision Pro pinch-touch is essentially the same as this paper's
> some form of ‘‘clutch’’ to engage and disengage the monitoring
The paper does talk about other challenges with look-to-select, even if it is biased by the thinking of back in the day:
> Unlike a mouse, it is relatively difficult to control eye position consciously and precisely at all times.
You have to remember that the historical setting for much of this research was to help paralyzed people to communicate, and pushing a button or the modern pinch-touch was not really always an option.
It doesn't seem like a 'problem'. For new tech, the emphasis should always be on pro users first (even if they don't initially adopt it because of the long lead times for those industries). So if you're designing an oil rig with these, a pro user would probably want to be able to independently interact with an element while looking for the next element since that's more time-efficient. Seems like a better term might be the 'Midas Touch Axiom'.
Seems like you could just implement a simple delay to solve this.
Let's say I want to click on the "reply" button below this text box. If I'm perfectly honest, I DO look at the button for a moment, then I move the mouse pointer over to it. But then right before clicking, my eyes switch back to the content I've created to observe that my click is having the desired effect on it.
I'm not actually looking at the button at the moment I click on it, but I DID look at it just a few milliseconds prior to the click. Why can't the UI just keep track of what I looked at a few milliseconds ago, to figure out that I actually wanted to click on the button, and not in the center of some text box?
One issue could be maybe I thought for a moment about replying but then changed my mind and decided to edit the content some more. But the UI has decided that I meant to click the "reply" button and so now it's been submitted prematurely. Yeah, I can see the problem now. The position of the mouse cursor is meaningful when clicking, and the Vision OS doesn't have a cursor. Cursors are important.
But decoupling hand gesture from eye tracking should not be that hard: the external cameras could just follow the hand and put a pointer on the screen.
This has been know for at least 30 years in the eye tracking business and it even has a name - The Midas Touch problem.