> The eye tracking driven input method which was seen as holy grail turns out to...

mcny · on Jan 31, 2024

> This has been know for at least 30 years in the eye tracking business and it even has a name - The Midas Touch problem.

I wanted to see how it is possible but sure enough, I found a paper from 1995 that cited even older research about this.

https://www.cs.tufts.edu/~jacob/papers/barfield.pdf

fouc · on Jan 31, 2024

from the paper, the definition of The Midas Touch problem is:

> They expect to be able to look at an item without having the look cause an action to occur.

So this doesn't seem to be a problem that Vision Pro has?

Geee · on Jan 31, 2024

"At first it is helpful to be able simply to look at what you want and have it occur without further action; soon, though, it becomes like the Midas Touch. Everywhere you look, another command is activated; you cannot look anywhere without issuing a command."

Yeah, Apple Vision doesn't have this problem, because eye tracking is used just for pointing; not for clicking on items.

svnt · on Jan 31, 2024

Whether it has this problem or not seems to really depend on how they use “OnEyesOver” events in practice.

yencabulator · on Jan 31, 2024

Vision Pro pinch-touch is essentially the same as this paper's

> some form of ‘‘clutch’’ to engage and disengage the monitoring

The paper does talk about other challenges with look-to-select, even if it is biased by the thinking of back in the day:

> Unlike a mouse, it is relatively difficult to control eye position consciously and precisely at all times.

You have to remember that the historical setting for much of this research was to help paralyzed people to communicate, and pushing a button or the modern pinch-touch was not really always an option.

AdamN · on Jan 31, 2024

It doesn't seem like a 'problem'. For new tech, the emphasis should always be on pro users first (even if they don't initially adopt it because of the long lead times for those industries). So if you're designing an oil rig with these, a pro user would probably want to be able to independently interact with an element while looking for the next element since that's more time-efficient. Seems like a better term might be the 'Midas Touch Axiom'.

Lendal · on Jan 31, 2024

Seems like you could just implement a simple delay to solve this.

Let's say I want to click on the "reply" button below this text box. If I'm perfectly honest, I DO look at the button for a moment, then I move the mouse pointer over to it. But then right before clicking, my eyes switch back to the content I've created to observe that my click is having the desired effect on it.

I'm not actually looking at the button at the moment I click on it, but I DID look at it just a few milliseconds prior to the click. Why can't the UI just keep track of what I looked at a few milliseconds ago, to figure out that I actually wanted to click on the button, and not in the center of some text box?

One issue could be maybe I thought for a moment about replying but then changed my mind and decided to edit the content some more. But the UI has decided that I meant to click the "reply" button and so now it's been submitted prematurely. Yeah, I can see the problem now. The position of the mouse cursor is meaningful when clicking, and the Vision OS doesn't have a cursor. Cursors are important.

ragazzina · on Jan 31, 2024

But decoupling hand gesture from eye tracking should not be that hard: the external cameras could just follow the hand and put a pointer on the screen.