r230255 - Only lower __builtin_setjmp / __builtin_longjmp to
Hal Finkel
hfinkel at anl.gov
Tue Mar 3 09:59:53 PST 2015
----- Original Message -----
> From: "Joerg Sonnenberger" <joerg at britannica.bec.de>
> To: cfe-commits at cs.uiuc.edu
> Sent: Tuesday, March 3, 2015 11:47:39 AM
> Subject: Re: r230255 - Only lower __builtin_setjmp / __builtin_longjmp to
>
> On Tue, Mar 03, 2015 at 10:50:03AM -0600, Hal Finkel wrote:
> > > Actually, it is a bit more complicated. A good setjmp/longjmp
> > > implementation in libc saves exactly the same set of registers.
> > > The
> > > only
> > > reason the GCC implementation can be better is that it forces
> > > register
> > > spilling to fixed locations and makes longjmp depend on that.
> >
> > No, this is not quite right. __builtin_setjmp can save many fewer
> > registers than setjmp, because it does not need to save registers
> > that
> > don't need to be restored at that particular call site (the
> > function
> > might not be using all available registers). Furthermore, those
> > saves,
> > and the restores, can be scheduled to hide their latency (not all
> > clumped together in the longjmp implementation).
>
> I think that's overly optimistic. I believe neither GCC nor Clang do
> CFG analysis between __builtin_setjmp and __builtin_longjmp. As such,
> __builtin_setjmp has to assume that all callee-saved registers will
> be
> clobbered and __builtin_longjmp has to assume they all have been
> clobbered. The optimisation potential for __builtin_setjmp is
> therefore
> to avoid saving a register that has been spilled already. Functions
> involving setjmp rarely are complex, so that potential is small.
Having implemented this, I assure you that the potential is not small. Eliminating the unnecessary spilling, the overhead of the function call, and better scheduling of the spills/restores, I've seen 10x speedups (even on modern OOO cores). Please also remember that small functions often don't use all available registers, especially vector registers (which tend to be expensive to save and restore), and so you can just ignore the caller-saved register entirely (you don't need to save them in the prologue or in setjmp call if you don't use them -- it is a pure savings).
-Hal
> Now
> for
> __builtin_longjmp, the question is more interesting. If it does use
> EH
> logic with a landing pad, it can leave most of the reloading to the
> __builtin_setjmp recovery side. This makes for a much more complex
> logic
> in the backend as I mentioned earlier. The other option is to
> essentially implement the same logic as longjmp with some scheduling
> potential. That __builting_longjmp part is where the real potential
> is.
>
> Joerg
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the cfe-commits
mailing list