If you malloc or roll your own every allocation has to be big enough to be put back on the free list. And the overhead for combining adjacent segments back together, which will involve additional cache lines at least 12.5% of the time. cache line / pointer size, and anything larger than a pointer has higher probability.
If you GC then it’s more pointer chasing during mark. Which will cache thrash at least one CPU, even if it’s not the one where most of the code is running.