I think good in-home vision models are probably still a little bit away yet, but it seems already the case you could start to plan for their existence. It would also be possible to fine-tune a puny model to trigger a function to pass the image to a larger hosted model if explicitly requested to, there are a variety of ways things could be tiered to keep processing that can be done practically at home at home, and still make it possible to automatically (or on user's request) defer the query to a larger model operated by someone else