[llvm] r223360 - [X86] Improve a dag-combine that handles a vector extract -> zext sequence.

Kuperstein, Michael M michael.m.kuperstein at intel.com
Mon Dec 8 00:10:47 PST 2014


Following up on this - I don't see anything statistically significant (in either direction) from LNT.

From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Kuperstein, Michael M
Sent: Thursday, December 04, 2014 22:48
To: Quentin Colombet
Cc: Commit Messages and Patches for LLVM
Subject: RE: [llvm] r223360 - [X86] Improve a dag-combine that handles a vector extract -> zext sequence.

According to IACA, there shouldn't be a problem with the timings, but, you're both right, of course, real-life results are a better indication.
I'll try LNT to see whether there's anything that takes a hit from this. We already have one data-point where it wins (the micro-benchmark).

Not closing the original PR in the meanwhile.

From: Quentin Colombet [mailto:qcolombet at apple.com]
Sent: Thursday, December 04, 2014 21:10
To: Kuperstein, Michael M
Cc: Commit Messages and Patches for LLVM; Chandler Carruth
Subject: Re: [llvm] r223360 - [X86] Improve a dag-combine that handles a vector extract -> zext sequence.

Hi,

On Dec 4, 2014, at 9:19 AM, Chandler Carruth <chandlerc at google.com<mailto:chandlerc at google.com>> wrote:


On Thu, Dec 4, 2014 at 9:05 AM, Kuperstein, Michael M <michael.m.kuperstein at intel.com<mailto:michael.m.kuperstein at intel.com>> wrote:
It looks like in the cited PR it was the best sequence, but I agree with you, it may not be the case globally.
Which stalls are you talking about? I think domain crossing shouldn't be a problem in this case, as the zexts would imply you want to be in the integer domain.

The domain cross as I understand it (and feel free to shed more detailed light on this aspect of Intel chips if you can, but I've failed to get any better clarification from Intel folks in the past) is more problematic than that.

It stems from separate execution units of some form (which form, and whether the "ports" as described in modern Intel manuals attach to them or are fixed to them isn't really important). Moving data in a register from one unit to the other unit stalls. This is just as true (if not more true) moving data from an integer xmm register into a gpr as it is moving data produced in the floating point vector unit to an input of an integer vector unit instruction.

Previously, the *primary* cause of vector shuffle performance problems in the x86 backend was because it heavily relied on pextr and pinsr sequences to manually extract and insert the elements into the desired positions. But the slow downs were vastly out of proportion to the number of instructions different. The best explanation, and one supported by various timings indications in Agners and elsewhere, is that there is a rather massive penalty incurred in sequences of these instructions. In my benchmarking, I routinely saw this penalty be much higher than that of domain crossing between integer and floating point units on Intel chips. On AMD chips, the penalties were more even, but were also both significantly higher than on Intel chips.


Regarding systematic testing - no, since this is a fairly specific pattern.
Do you have any examples in mind that will match this, but be negatively impacted?

I would start off with checking LNT, maybe SPEC (although I'm loath to trust SPEC numbers for this kind of change).


Regarding patterns impacted by this - if I understand correctly, the pattern that this was introduced to catch was precisely the one the LIT test checks - 64-bit GEPs that use indexes extracted from a 4xi32 vector. There's a rdar linked to the test.  Quentin, do you think it's worth checking what the impact of this is on the original issue?


I'll have a look at the original radar and I'll let you know.


This also might be uncovered by checking the LNT results.


All this said, I'm not certain of anything here. Maybe this is a strict win. I just think it needs more broad measurements than the PR shows.

I agree with Chandler and in fact I thought that has been done. Therefore, by all means, please do performance measurements.

Thanks,
Q.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141208/42bd0508/attachment.html>


More information about the llvm-commits mailing list