but it only works if you don't need to run the defers/Drop's/destructors/etc. for stuff that's on the stack between the current frame and the handler's frame. Which you do, most of the time.
> it only works if you don't need to run the defers/Drop's/destructors/etc
Indeed. And the per frame cleanup is also language agnostic which adds overhead; it also must support both sjlj and dwarf frames[1]; it is also done in two phases: destructors are only run if an actual catch is found: an unhandled exception doesn't run destructors to preserve state in the core file. This requires a two-phase unwinding that again slows things down.
Another big bottleneck that might not be captured in OP test is that the unwinder has to take (or used to, things got better recently) a global lock to prevent races with dlclose, which greatly limit scalability of exception handling.
Still very nice improvements from OP.
[1] although I'm not sure you can mix them in the same program or it is a platform-wide decision.
> the unwinder has to take (or used to, things got better recently) a global lock to prevent races with dlclose
If someone from another thread decided to unload a library whose code is still being executed in this thread then this thread would normally crash anyhow, and do so irrecoverably, right?
Also, I don't think it's possible to perform any kind of backwards-compatible static analysis that would tell you when it's safe to just JMP. Unless you have full information, perhaps (at the very least everything already specialized).