Higher on MMLU (62.5 vs 56.7 for phi-2) and GSM8k (61.5 vs 61.1). https://www.microsoft.com/en-us/research/blog/phi-2-the-surp... The phi-2 numbers are for 5-shot MMLU and 8-shot GSM8k. The blog post doesn't get that specific for Qwen, but it's very likely they tested the same way.