[PATCH] D12561: [X86][SSE] Match zero/any extension shuffles that don't start from the first element
Quentin Colombet via llvm-commits
llvm-commits at lists.llvm.org
Thu Sep 17 10:51:37 PDT 2015
qcolombet added a comment.
Hi Simon,
This looks mostly good to me, though it seems a couple of tests are regressing, aren’t they?
I have annotated them, could you comment on why this is beneficial or at least neutral?
Thanks,
-Quentin
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7359
@@ -7359,1 +7358,3 @@
+ int NumElements = VT.getVectorNumElements();
+ int NumLaneElements = 128 / EltBits;
assert((EltBits == 8 || EltBits == 16 || EltBits == 32) &&
----------------
NumEltsPerLane?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7365
@@ +7364,3 @@
+ assert((Offset < NumLaneElements || Offset % NumLaneElements == 0) &&
+ "Extension offset must be in the first lane or start an upper lane.");
+
----------------
Document those preconditions on the function prototype as well.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7369
@@ +7368,3 @@
+ auto SafeOffset = [&](int Idx) {
+ int OffsetLane = Offset / NumLaneElements;
+ int IdxLane = Idx / NumLaneElements;
----------------
I wonder if this lambda makes sense.
Unless I am mistaken, this value is an invariant of the lambda and can be hoisted outside of the only loop where it is used.
I.e., we can just make the test: OffsetLane == Idx/NumLaneElements at the right place.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7390
@@ -7365,1 +7389,3 @@
if (Subtarget->hasSSE41()) {
+ // Not worth offseting 128-bit vectors if scale == 2.
+ if (Offset && Scale == 2 && VT.getSizeInBits() == 128)
----------------
Could you explain why in the comment?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7555
@@ +7554,3 @@
+ // upper lane.
+ // FIXME: Is it ever worth allowing a -ve base offset?
+ if (!((0 <= Offset && Offset < NumLaneElements) ||
----------------
What is “-ve"?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7576
@@ -7502,1 +7575,3 @@
+ // If we are offsetting, don't extend if we only match a single input.
+ if (Offset != 0 && Matches < 2)
----------------
Explain why, I am sure it won’t be obvious when we look at that line in a couple of months :).
================
Comment at: test/CodeGen/X86/vec_cast2.ll:61
@@ +60,3 @@
+; CHECK-WIDE-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
+; CHECK-WIDE-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
+; CHECK-WIDE-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
----------------
Isn’t the new sequence less efficient than the previous one?
================
Comment at: test/CodeGen/X86/vector-shuffle-256-v32.ll:1759
@@ +1758,3 @@
+; AVX1-NEXT: vpsrld $16, %xmm0, %xmm0
+; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
+; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
----------------
Ditto?
================
Comment at: test/CodeGen/X86/vector-shuffle-256-v32.ll:1777
@@ +1776,3 @@
+; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
+; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
+; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
----------------
Ditto?
================
Comment at: test/CodeGen/X86/vector-zext.ll:134
@@ -134,1 +133,3 @@
+; SSE41-NEXT: pmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
+; SSE41-NEXT: movdqa %xmm2, %xmm0
; SSE41-NEXT: retq
----------------
Ditto?
================
Comment at: test/CodeGen/X86/vector-zext.ll:212
@@ -215,1 +211,3 @@
+; SSE41-NEXT: pmovzxbq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
+; SSE41-NEXT: movdqa %xmm2, %xmm0
; SSE41-NEXT: retq
----------------
Ditto?
Repository:
rL LLVM
http://reviews.llvm.org/D12561
More information about the llvm-commits
mailing list