It's basically "cheating" at GC by exploiting a very narrow use case. I saw a trick like this at Smalltalk Solutions in 2000 with a 3D game debugging tool. The "GC" actually simply threw everything away for each frame tick.
Someone needs to come up with something like a functional language based on a trick like this. Or maybe a meta-language akin to RPython, so people can write domain specific little languages for doing things like serving web requests, combined with domain specific "cheating" GC that can get away with doing much less work than a full general purpose GC.
Couldn't a pure functional programming environment be structured to allow for such GC "cheating?"
I work in VBScript inside of classic ASP and the statement, "Also anytime a functions scope terminates, all its memory immediately goes away.", is true there. I doubt anyone would suggest that VBScript is a functional language with immutable data structures.
Additionally, all memory used by a page is cleaned up when the page is finished processing. This has to do with "Memory is scoped" more than immutable data structures.
I think you mean memory usage is lexically scoped because no pointers to blocks are saved or returned? I think, and didn't really want to think harder, just thought in the same sense that functional brings some useful baggage with it, so does lexical scope.
Erlang's GC is the perfect example of ‘work smarter, not harder’. It's not a particularly advanced GC, it just does so little work that it's soft-real-time capable.
The GC is a fallback mechanism to catch any refcounting loops, but the reference counting is still in force, so non-looped objects still do get collected.
When the GC runs, everything gets touched. Without it, the refcounting still hits some shared objects, but this is reduced enough to be worth it, and the heap doesn't grow without bounds.
I'm not sure why you would call it cheating -- it seems like a very principled engineering decision to me.
Even if you never deploy, it's not like Python processes will run forever if you have have GC on. Big deployments usually recycle their processes regularly. Why wouldn't you? There's no downside once you have a big enough cluster.
I recall hearing that YouTube also runs with GC off, but I don't have a source for that.
I don't think anything pejorative is intended, but it is a bit like taking out a loan with no intention of repaying.
Another way of looking at it is that you are optimizing away a feature that Python needs in order to be a general-purpose language, but that is not needed for your purpose.
As long as you have the build infrastructure to support patched versions of Python, and can run multiple ones, I don't see the problem.
When you're running at scale like Instagram, it's inevitable that you need to do stuff like this. You're just pushing the limits of what open source software has been tested for.
I think this solution is a hack in the good sense... in particular because you are taking something away rather than adding crap on top, which is the typical solution. I wouldn't be as impressed if they came up with a fancy new GC that took 18 months to write and only worked for their workload. That would be the wrong way of thinking about the problem.
I think C++ and Rust have reference counting objects that can't collect cycles. Add that with an arena allocator and it sounds close to what you are describing. It's more explicit though, whereas Python does the reference counting for you here.
You could have a general-purpose "heap" interface that controls allocation. This would mean that you could control allocation space (for games this is really useful), and tearing down the the heap would tear down its owned objects. You could either hand out weak pointers or statically tie the lifetime to the lifetime of the heap, like Rust does.
I'm stupid so I don't know better but I think Go's new proposed GC is supposed to do something similar. Throw away everything that was created in a go-routine once the go-routine exits.
PHP works very similarly to what they describe in the post. It has a garbage collector that's effectively reference counting, but the whole heap just gets trashed at the end of every request.
It's not functional, and not automatic but Rust has an arena type which supports very fast allocation and de-allocation, with the restriction that de-allocation has to happen all at once.
Someone needs to come up with something like a functional language based on a trick like this. Or maybe a meta-language akin to RPython, so people can write domain specific little languages for doing things like serving web requests, combined with domain specific "cheating" GC that can get away with doing much less work than a full general purpose GC.
Couldn't a pure functional programming environment be structured to allow for such GC "cheating?"