I don't think so. You have to know exactly what resolution the image will be resized to in order to predict the solution where dithering produces the model you want. How would they know that?
Auto resizing is usually to only a handful of common resolutions, and if inexpensive to generate (probably the case) you could generate versions of this for all of them and see which ones worked.
I don't think this is any different from an LLM reading text and trusting it. Your system prompt is supposed to be higher priority for the model than whatever it reads from the user or from tool output, and, anyway, you should already assume that the model can use its tools in arbitrary ways that can be malicious.
This is a big deal.
I hope those nightshade people don't start doing this.