[PATCH] D12561: [X86][SSE] Match zero/any extension shuffles that don't start from the first element

Simon Pilgrim via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 18 09:01:16 PDT 2015


RKSimon marked 5 inline comments as done.
RKSimon added a comment.

Regarding the tests - the over use of PSHUFB can be a performance issue - you often have to load the shuffle mask (>5cy) and many targets (AMD + Intel) have a poor latency/throughput executing it (>3cy). In comparison PSHUFD/PSRLDQ as well as PMOVZX are pretty light instructions (1-2cy each).


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7359
@@ +7358,3 @@
+  auto SafeOffset = [&](int Idx) {
+    return OffsetLane == (Idx / NumEltsPerLane);
+  };
----------------
I've pulled the 'OffsetLane' constant out - I'd prefer to keep the lambda though as its used in quite a few places and avoids cluttering the code.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7378
@@ -7350,1 +7377,3 @@
   if (Subtarget->hasSSE41()) {
+    // Not worth offseting 128-bit vectors if scale == 2, a pattern using
+    // PUNPCK will catch this in a later shuffle match.
----------------
This is definitely a borderline case - without the earlyout most of the time we are just replacing a XOR (zero)+PUNCKH with a PSHUFD+PMOVZX. There's next to nothing in it so I went with avoiding a change.


Repository:
  rL LLVM

http://reviews.llvm.org/D12561





More information about the llvm-commits mailing list