I input the same prompt across all 3 and gauge the output of the first response. Whichever assistant best “understands” what I want to accomplish, I choose that assistant to continue the follow up prompts with.
There is a bias where my lack of prompting technique may be the cause of the assistant not providing the best response. But, im grading on a fair curve since they all have the same input and I see this as the core value proposition of the assistant.