Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

God this is depressing. Not the product itself, but the need for it. That software has failed to be programmable to such a degree that a promising approach is rendering the GUI and analysing the resultant image with an AI model. It's insane that we have to treat computers as fax machines, capable only of sending hand-written forms over a network. The gap between how people use computers and the utility they could provide is massive.


Actually this kind of stuff is super exciting -- we don't need to depend on companies exposing APIs for their website -- we can just use something like Skyvern instead!


Two ways of looking at it. I guess what the OP is saying is that if there was an agreed upon standard for semantically understanding these pages without having to use these sophisticated methods, it would be much easier


I have been interested in doing something similar for a while. I also think this has a lot of potential as the core of a virtual assistant.


You could still use Skyvern if they exposed an API.


On the contrary! Isn't it neat that we now have a unified API that both humans and computers can consume?


No, because we already have a machine API. If you want to write an application, you need to write something a computer can understand. So a computer-usable API is always created. It takes additional effort to hide that functionality behind a interface. The process we have now is: machine → GUI → image processing → generative AI. The interface we could have is: machine → machine. It would take no extra effort to do this. It would just need some slight changes in organisation. In fact it is easier at every level. If you separate logic from interface, you end up with an architecture that is a set of functions (a library) into which you can interface programmatically, or with a GUI, or by any other means. Separating code like this (MVC) is good practice and allows for a range of different interfaces to be created to the same functionality. It is also easier for an engineering perspective and produces a better product. Think of git. There are hundreds of different interfaces created to the functionality git provides. All software should be structured like this (though perhaps by means of a library rather than a shell interface).

I should add that this is a particularly grim prospect from a software engineering perspective. It makes me imagine a future where no one bothers exposing a stable API to anything, so the only way to interact with other people's code is using an AI middle-man.


Good luck debugging any errors


The world is governed by probabilities. What more could go wrong if algorithms did too? /s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: