I've had some whoppers in Erlang, where "garbage generated" ends up costing N-squared, even though the operations were just N. In fact, the garbage collection footprint turns out to be one of the most significant markers for functional performance in my experience. Erlang's per-process heaps help here, because each heap can typically be tiny, but it's still something that needs to be managed as closely as, say, stale pointers in C++.
A copying gc helps with that somewhat. When the young generation gets copied you only pay for data that is still alive, not garbage generated. I believe erlang uses mark-and-sweep rather than copying though.