if you could do this automatically, it would be game changer as you could run top 5 best models in parallel and select best answer every time
but it's not practical because you are the bottleneck as you have to read all 5 solutions and compare them
remember they have access to the RLHF reward model, against which they can evaluate all N outputs and have the most "rewarded" answer picked and sent