[PATCH] D12561: [X86][SSE] Match zero/any extension shuffles that don't start from the first element

Thu Sep 17 10:51:37 PDT 2015

qcolombet added a comment.

Hi Simon,

This looks mostly good to me, though it seems a couple of tests are regressing, aren’t they?

I have annotated them, could you comment on why this is beneficial or at least neutral?

Thanks,
-Quentin


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7359
@@ -7359,1 +7358,3 @@
+  int NumElements = VT.getVectorNumElements();
+  int NumLaneElements = 128 / EltBits;
   assert((EltBits == 8 || EltBits == 16 || EltBits == 32) &&
----------------
NumEltsPerLane?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7365
@@ +7364,3 @@
+  assert((Offset < NumLaneElements || Offset % NumLaneElements == 0) &&
+         "Extension offset must be in the first lane or start an upper lane.");
+
----------------
Document those preconditions on the function prototype as well.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7369
@@ +7368,3 @@
+  auto SafeOffset = [&](int Idx) {
+    int OffsetLane = Offset / NumLaneElements;
+    int IdxLane = Idx / NumLaneElements;
----------------
I wonder if this lambda makes sense.
Unless I am mistaken, this value is an invariant of the lambda and can be hoisted outside of the only loop where it is used.
I.e., we can just make the test: OffsetLane == Idx/NumLaneElements at the right place.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7390
@@ -7365,1 +7389,3 @@
   if (Subtarget->hasSSE41()) {
+    // Not worth offseting 128-bit vectors if scale == 2.
+    if (Offset && Scale == 2 && VT.getSizeInBits() == 128)
----------------
 Could you explain why in the comment?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7555
@@ +7554,3 @@
+      // upper lane.
+      // FIXME: Is it ever worth allowing a -ve base offset?
+      if (!((0 <= Offset && Offset < NumLaneElements) ||
----------------
What is “-ve"?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7576
@@ -7502,1 +7575,3 @@
 
+    // If we are offsetting, don't extend if we only match a single input.
+    if (Offset != 0 && Matches < 2)
----------------
Explain why, I am sure it won’t be obvious when we look at that line in a couple of months :).

================
Comment at: test/CodeGen/X86/vec_cast2.ll:61
@@ +60,3 @@
+; CHECK-WIDE-NEXT:    vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
+; CHECK-WIDE-NEXT:    vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
+; CHECK-WIDE-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
----------------
Isn’t the new sequence less efficient than the previous one?

================
Comment at: test/CodeGen/X86/vector-shuffle-256-v32.ll:1759
@@ +1758,3 @@
+; AVX1-NEXT:    vpsrld $16, %xmm0, %xmm0
+; AVX1-NEXT:    vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
+; AVX1-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
----------------
Ditto?

================
Comment at: test/CodeGen/X86/vector-shuffle-256-v32.ll:1777
@@ +1776,3 @@
+; AVX1-NEXT:    vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
+; AVX1-NEXT:    vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
+; AVX1-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
----------------
Ditto?

================
Comment at: test/CodeGen/X86/vector-zext.ll:134
@@ -134,1 +133,3 @@
+; SSE41-NEXT:    pmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
+; SSE41-NEXT:    movdqa %xmm2, %xmm0
 ; SSE41-NEXT:    retq
----------------
Ditto?

================
Comment at: test/CodeGen/X86/vector-zext.ll:212
@@ -215,1 +211,3 @@
+; SSE41-NEXT:    pmovzxbq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
+; SSE41-NEXT:    movdqa %xmm2, %xmm0
 ; SSE41-NEXT:    retq
----------------
Ditto?


Repository:
  rL LLVM

http://reviews.llvm.org/D12561