It can be many things, my guess is likely memory bandwidth. There's so much MB/s the RAM can handle. Also, above 48 cores, those are hyper threading and for CPU bound tasks, hyper threading is known to be slower.
Higher parallelism is usually about timeliness for interactive workloads, not throughput. Unless it’s just the classic fallacy that parallelism = speed.
For throughput tasks it’s often the case that you go with less parallelism to reduce Amdahl’s law a few percent, and instead investing in keeping the pipeline saturated, so that the variance in concurrent tasks is lower. Work stealing being one of the more notable tricks.
Example: https://ieeexplore.ieee.org/document/7804711
Edit: looks like there's only 12 cores per CPU so that's 24 physical cores. 48 HT cores. So the drop must be cache trashing?