It's also a little insane to me that what Adept has been supposedly building for years with 300+ mil in funding can now be built in a day with Open AI APIs?
I think Adept pivoted along the way but original concept was very similar to this.
But its too expensive to become practical with the OpenAI API. Also, demo is cool until you see the real-world webpages, then you'll realize that this only works less than %50 of webpages.
GPT-4V may be surprisingly robust here. Set of mark prompting(which is accomplished here with Vim) improves grounding by a silly high amount.
https://som-gpt4v.github.io/
[1] https://www.adept.ai/