I think that LM Studio has an OpenAI "compliant" API, so if there is something similar that supports vision+text then it would be easy enough to make the base URL configurable and then point it to localhost.
Do you know of a simple setup that I can run locally with support for both images and text?
Do you know of a simple setup that I can run locally with support for both images and text?