I built a memory allocator for testing not too long ago that only allocated but no-op'd deallocate(using the C++ Allocator interface). It was quite amazing that it resulted in a 25-50% performance boost in many cases I was testing. It's not for real code as you only get destruction but not deallocation. It let me pull that part out of the tests to get a better view of the performance
In case you weren't aware, this is a common enough technique for it to actually have a name: bump allocation. It's very useful in short lived programs with bounded allocations, for the exact reason you outlined.
this is why composable allocators are fantastic. You can request a dynamic memory slate for a given context, then allocate as you wish, no-op delete, when you're done with the context, you delete all of them in one go.
Not just that: malloc is faster if you don’t have to care about thread safety, or support interposing, or hooks, or if security is not a concern. It is easy to “beat malloc” in a specific usecase if you know what it will be beforehand. Doing this in general is what is hard.