[llvm] r223360 - [X86] Improve a dag-combine that handles a vector extract -> zext sequence.

Thu Dec 4 09:19:48 PST 2014

On Thu, Dec 4, 2014 at 9:05 AM, Kuperstein, Michael M <
michael.m.kuperstein at intel.com> wrote:

> It looks like in the cited PR it was the best sequence, but I agree with
> you, it may not be the case globally.
>
> Which stalls are you talking about? I think domain crossing shouldn’t be a
> problem in this case, as the zexts would imply you want to be in the
> integer domain.
>

The domain cross as I understand it (and feel free to shed more detailed
light on this aspect of Intel chips if you can, but I've failed to get any
better clarification from Intel folks in the past) is more problematic than
that.

It stems from separate execution units of some form (which form, and
whether the "ports" as described in modern Intel manuals attach to them or
are fixed to them isn't really important). Moving data in a register from
one unit to the other unit stalls. This is just as true (if not more true)
moving data from an integer xmm register into a gpr as it is moving data
produced in the floating point vector unit to an input of an integer vector
unit instruction.

Previously, the *primary* cause of vector shuffle performance problems in
the x86 backend was because it heavily relied on pextr and pinsr sequences
to manually extract and insert the elements into the desired positions. But
the slow downs were vastly out of proportion to the number of instructions
different. The best explanation, and one supported by various timings
indications in Agners and elsewhere, is that there is a rather massive
penalty incurred in sequences of these instructions. In my benchmarking, I
routinely saw this penalty be much higher than that of domain crossing
between integer and floating point units on Intel chips. On AMD chips, the
penalties were more even, but were also both significantly higher than on
Intel chips.

>
>
> Regarding systematic testing – no, since this is a fairly specific
> pattern.
>
> Do you have any examples in mind that will match this, but be negatively
> impacted?
>

I would start off with checking LNT, maybe SPEC (although I'm loath to
trust SPEC numbers for this kind of change).

>
>
> Regarding patterns impacted by this - if I understand correctly, the
> pattern that this was introduced to catch was precisely the one the LIT
> test checks – 64-bit GEPs that use indexes extracted from a 4xi32 vector.
> There’s a rdar linked to the test.  Quentin, do you think it’s worth
> checking what the impact of this is on the original issue?
>

This also might be uncovered by checking the LNT results.

All this said, I'm not certain of anything here. Maybe this is a strict
win. I just think it needs more broad measurements than the PR shows.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141204/64af59bf/attachment.html>