vpmullq is not that useful; in bignum code you also want the upper part of the product, and there is no corresponding vpmulhq instruction to get that.
On the other hand, vpmadd52luq and vpmadd52huq do give you access to the lower and upper parts of a 52x52->104 bit product, and those instructions perform well in the Intel chips, 3x faster than vpmullq.
On the other hand, vpmadd52luq and vpmadd52huq do give you access to the lower and upper parts of a 52x52->104 bit product, and those instructions perform well in the Intel chips, 3x faster than vpmullq.