Note that quadrupling the architectural state doesn't mean quadrupling the actual state.
In fact, it looks like Haswell and Skylake-X had the same number of physical registers, 168. So that's a straightforward doubling from 256x168 to 512x168.
But further into the thread it looks like the first gen E cores had about 200 128-bit register lines, so trying to fit 512x32 would have been very tight.
To put some of that a different way: The vector design headed for E cores was 128 bits stretching to 256 bits. If it had been 256 bits all the way through, it's likely they would have added AVX-512 support, even if they couldn't increase the size of the register file at all.
In fact, it looks like Haswell and Skylake-X had the same number of physical registers, 168. So that's a straightforward doubling from 256x168 to 512x168.
But further into the thread it looks like the first gen E cores had about 200 128-bit register lines, so trying to fit 512x32 would have been very tight.
To put some of that a different way: The vector design headed for E cores was 128 bits stretching to 256 bits. If it had been 256 bits all the way through, it's likely they would have added AVX-512 support, even if they couldn't increase the size of the register file at all.