Hacker News new | past | comments | ask | show | jobs | submit login

We are probably getting closer to that with the newer multimodal LLMs, but you'd almost need to take a screenshot on intervals fed directly to the LLM to provide a sort of chronological context to help it understand what the user is trying to do and gauge the users intentions.

As you say though, I don't know how many people would be comfortable having screenshots of their computer sent arbitrarily to a non-local LLM.






> As you say though, I don't know how many people would be comfortable having screenshots of their computer sent arbitrarily to a non-local LLM.

Of the technical, hang-out-on-HN crowd? Ya, probably not many.

Of the other 99.99% of computer users? The majority of them wouldn't even think about it, let alone care. To quote a phrase, ”the user is going to pick dancing pigs over security every time”.

Even without the non-chalent attitude towards security, the majority of the population has been so conditioned that everything they do on a computer is already being sent to 1) Apple, 2) Google, 3) Microsoft, or 4) their employer, that they're burnt-out of caring.

All that is to say that if you can make a widely-available real-time LLM assistant that appeals to non-technical users, please invite me to your private-island-celebrity-filled-yacht-parties.


I think we're well into the paradigm of "hidden employee activity monitoring software" already taking periodic screenshots and sending it to an LLM somewhere, which then generates aggregate performance metrics and dashboards for managers. I've heard of multiple companies working on this for $bigcorp environments, customer service/call center workstation PCs, etc.

Models with native video understanding would do the trick - Advanced Voice Mode on the ChatGPT iOS/Android app lets you use your camera, works pretty well; there's also https://aistudio.google.com/live (AFAIK there are no open-source models with similar capabilities)

> I don't know how many people would be comfortable having screenshots of their computer sent arbitrarily to a non-local LLM

shudders.


So, the Replay feature being slowly rolled out in Win11?



Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: