[LLVMdev] X86 FMA4

dag at cray.com dag at cray.com
Mon Jul 30 11:12:15 PDT 2012


Michael Gottesman <mgottesman at apple.com> writes:

> But if you don't believe me, time the instructions yourself (its an
> important thing to have in your toolbox anyways since sometimes Intel's
> documentation can be non-specific). I have a small instruction timing
> project lying around somewhere, if you want it I can send it to you
> privately.

Note that timings aren't everything.  Second-order effects can be
meaningful.  I've seen it many times.  I would not be surprised at all
if a scalar spill/restore is much faster than a vector one due to the
lower impact on cache bandwidth.  We know that scalar spill/reload is
much better.  We have seen it in many real codes.

Single instruction timings and small kernels can be very, very
misleading.

                            -Dave



More information about the llvm-dev mailing list