I am working on automation of phones (open source) - https://github.com/BandarLa...

I am working on automation of phones (open source) - https://github.com/BandarLabs/clickclickclick

I haven't been able to quite get the Llama vision models working but I suppose with new releases in future, it should work as good as Gemini in finding bounding boxes of UI elements.