> *But one thing moving collectors can do that non-moving ones can't is generati...

gnufx · on June 3, 2022

> That is untrue.

Indeed. Emacs actually got a preliminary port to the Boehm generational, incremental, mark-sweep collector many years ago, with a complete lack of interest in pursuing it.

Also, Emacs conservative stack scanning came from SIOD via SCM https://people.delphiforums.com/gjc//siod.html#garbage

zasdffaa · on June 3, 2022

> Indeed. Emacs actually got a preliminary port to the Boehm generational, incremental, mark-sweep collector many years ago, with a complete lack of interest in pursuing it.

Ja have a link? Asking because I've interacted with the devs and they are highly focused on getting emacs better. They would not reject it if it offered solid value.

Also you have a link to this Boehm collector you mention as the only one I know of is the conservative one and I'd like to know more. TIA

gnufx · on June 3, 2022

That work is actually still in the repo: https://git.savannah.gnu.org/cgit/emacs.git/tree/?h=other-br...

You may not believe the history, but you weren't there; I don't know how much is in mail archives. There was, for instance, a bizarre campaign to keep the charset `unification' out of Emacs 22. For some reason rms went along with that even when eval showed the argument was bogus.

zasdffaa · on June 3, 2022

That looks like the boehm GC, but Boehm is conservative. In no way is the Boehm GC that I'm aware of is generational or incremental (though it is mark-sweep).

Start of long thread:

https://mail.gnu.org/archive/html/emacs-devel/2016-11/msg005...

"I was poking at alloc.c recently and realized that the existing conservative GC code is somewhat unsafe. In particular,

1) mark_maybe_pointer looks only for exact matches on object start. It's perfectly legal for the compiler to keep an interior object pointer and discard the pointer to the object start.

2) INTERVAL is GCed, but it's not represented in the memory tree: struct interval isn't a real lisp object and it's allocated as MEM_TYPE_NON_LISP. Even a direct pointer to the start of an interval won't protect it from GC. Shouldn't we treat intervals like conses?

We've been getting by on dumb luck and the magnanimity of the compiler."

Boehm was dropped for good reason.

gnufx · on June 3, 2022

> In no way is the Boehm GC that I'm aware of is generational or incremental

Boehm disagrees: https://www.hboehm.info/gc/#details

That thread isn't talking about the same thing, which it says no-one volunteered to write.

zasdffaa · on June 3, 2022

I don't have time to go over the thread - conservative seems dodgy. I'll have to take your word for it. As for your main point, much to my surprise, you're right although I don't understand how, it doesn't match what I've read of it. No matter, you're right.

rurban · on June 3, 2022

The boehm GC is the most conservative one. And very slow.

gnufx · on June 3, 2022

Slow in the same way that the conservative stack scanning was bound to leak. I did actually run Emacs with it after experience elsewhere.

Edit: I wonder how Boehm is somehow "most conservative", compared with, say, the Memory Pool System which I'd also look at these days but wasn't an option then.

zasdffaa · on June 3, 2022

The MPS wasn't 'conservative' in any way IIRC, it was precise? Are we talking different terminologies perhaps?

https://en.wikipedia.org/wiki/Boehm_garbage_collector

https://en.wikipedia.org/wiki/Tracing_garbage_collection#Pre...

gnufx · on June 3, 2022

Why argue with the people who actually produce the collectors?

"GC uncooperative programs using conservative collection" https://www.ravenbrook.com/project/mps/

I think it's a Bartlett-style "mostly-copying" collector, as in Scheme->C, but you can read the code and documentation.

zasdffaa · on June 3, 2022

I think it's an option, but I admit not one I was aware of. AFAIK the MP is a framework, and this is an option not the only way... but still, you're again!

I'll do some reading, thanks.

moonchild · on June 3, 2022

So you still require a write barrier to deal with old->new pointers? Or you mark the whole heap but sweep only the nursery for small collections?

kazinator · on June 3, 2022

Object mutating assignments are checked for old -> new direction and handled in one of two ways, let's call them A and B.

Under method A, when a old -> new assignment takes place, the new object is added to a "check" array. At the same time, its generation is changed from 0 to -1, so that this is wastefully not done twice. Objects in the check array are processed during the mark phase; they are marked as reachable (and that will promote them to the mature generation).

Under method B, the new object is not involved. An old object has been mutated and we record it in a "mutated" array. (So that this isn't done twice for the same object, we also change its generation to -1; that gets fixed back during GC.) The "mutated" array is subject to marking in the mark phase (even though mature objects are not), in order that we chase the references from that object to any new object. These (necessarily considered reachable) objects are also processed in the sweep phase to return them to gen 1.

When might you use method B? Say there is a bulk assignment operation, like hundreds of elements of a vector object are set to values, some of which may be new objects. In that situation, it's more efficient to put the aggregate object into the mutated array and not deal with any information about the right hand side objects at all.

moonchild · on June 4, 2022

So this is a choice made by the user? Or the choice is made automatically using some heuristic? Regardless: I don't really see the point of avoiding moving gc if you have to have barriers. Moving gc will be faster; the only reason I can think of for avoiding it is to avoid mutator overhead, but you are paying barriers anyway.