> But one thing moving collectors can do that non-moving ones can't is generational sequestration, that is, keeping the youngest cohort of Lisp values separated from older ones. This allows fast, intermediary cycles which only scan the "nursery" generation (the rationale being old objects are very likely to still be referenced). A non-moving collector must traverse the full set on each cycle since its allocations are interleaved. This is why Emacs bros are as eager to raise collection thresholds as motherboard jockeys are to overclock their BIOS.
That is untrue. TXR Lisp has a generational GC, over non-moving heaps. The way this works is incredible. Are you sitting down? Okay, here is a pen and sign the non-disclosure agreement. Good!
So, when baby objects are allocated, pointers to these objects are recorded in ... a nursery array. Not the objects themselves, just their pointers.
During a fast garbage collection cycle (in the sweep phase) we just have to sweep through this array to find the unreachable babies all in a bunch, rather than looking for them through all the heap blocks.
Indeed. Emacs actually got a preliminary port to the Boehm generational, incremental, mark-sweep collector many years ago, with a complete lack of interest in pursuing it.
> Indeed. Emacs actually got a preliminary port to the Boehm generational, incremental, mark-sweep collector many years ago, with a complete lack of interest in pursuing it.
Ja have a link? Asking because I've interacted with the devs and they are highly focused on getting emacs better. They would not reject it if it offered solid value.
Also you have a link to this Boehm collector you mention as the only one I know of is the conservative one and I'd like to know more. TIA
You may not believe the history, but you weren't there; I don't know how much is in mail archives. There was, for instance, a bizarre campaign to keep the charset `unification' out of Emacs 22. For some reason rms went along with that even when eval showed the argument was bogus.
That looks like the boehm GC, but Boehm is conservative. In no way is the Boehm GC that I'm aware of is generational or incremental (though it is mark-sweep).
"I was poking at alloc.c recently and realized that the existing conservative GC code is somewhat unsafe. In particular,
1) mark_maybe_pointer looks only for exact matches on object start. It's perfectly legal for the compiler to keep an interior object pointer and discard the pointer to the object start.
2) INTERVAL is GCed, but it's not represented in the memory tree: struct interval isn't a real lisp object and it's allocated as MEM_TYPE_NON_LISP. Even a direct pointer to the start of an interval won't protect it from GC. Shouldn't we treat intervals like conses?
We've been getting by on dumb luck and the magnanimity of the compiler."
I don't have time to go over the thread - conservative seems dodgy. I'll have to take your word for it. As for your main point, much to my surprise, you're right although I don't understand how, it doesn't match what I've read of it. No matter, you're right.
Slow in the same way that the conservative stack scanning was bound to leak. I did actually run Emacs with it after experience elsewhere.
Edit: I wonder how Boehm is somehow "most conservative", compared with, say, the Memory Pool System which I'd also look at these days but wasn't an option then.
I think it's an option, but I admit not one I was aware of. AFAIK the MP is a framework, and this is an option not the only way... but still, you're again!
Object mutating assignments are checked for old -> new direction and handled in one of two ways, let's call them A and B.
Under method A, when a old -> new assignment takes place, the new object is added to a "check" array. At the same time, its generation is changed from 0 to -1, so that this is wastefully not done twice. Objects in the check array are processed during the mark phase; they are marked as reachable (and that will promote them to the mature generation).
Under method B, the new object is not involved. An old object has been mutated and we record it in a "mutated" array. (So that this isn't done twice for the same object, we also change its generation to -1; that gets fixed back during GC.) The "mutated" array is subject to marking in the mark phase (even though mature objects are not), in order that we chase the references from that object to any new object. These (necessarily considered reachable) objects are also processed in the sweep phase to return them to gen 1.
When might you use method B? Say there is a bulk assignment operation, like hundreds of elements of a vector object are set to values, some of which may be new objects. In that situation, it's more efficient to put the aggregate object into the mutated array and not deal with any information about the right hand side objects at all.
So this is a choice made by the user? Or the choice is made automatically using some heuristic? Regardless: I don't really see the point of avoiding moving gc if you have to have barriers. Moving gc will be faster; the only reason I can think of for avoiding it is to avoid mutator overhead, but you are paying barriers anyway.
That is untrue. TXR Lisp has a generational GC, over non-moving heaps. The way this works is incredible. Are you sitting down? Okay, here is a pen and sign the non-disclosure agreement. Good!
So, when baby objects are allocated, pointers to these objects are recorded in ... a nursery array. Not the objects themselves, just their pointers.
During a fast garbage collection cycle (in the sweep phase) we just have to sweep through this array to find the unreachable babies all in a bunch, rather than looking for them through all the heap blocks.