[PATCH] D17490: [InstCombine][SSE] Demanded vector elements for scalar intrinsics

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 19 09:09:35 PDT 2016


spatel accepted this revision.
spatel added a comment.
This revision is now accepted and ready to land.

LGTM - see inline for 1 minor change to the patch itself.

The rest of the comments are just 'thinking out loud' sort of questions so we have a paper trail for how we got here.
On that note, if you know of bug reports that track any of this, please do include them in the commit message.
https://llvm.org/bugs/show_bug.cgi?id=22206 is one for the sqrt intrinsic for whenever that part of the patch set lands.


================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1374-1389
@@ -1339,1 +1373,18 @@
 
+  case Intrinsic::x86_sse41_round_ss:
+  case Intrinsic::x86_sse41_round_sd: {
+    // These intrinsics demand the upper elements of the first input vector and
+    // the lowest element of the second input vector.
+    bool MadeChange = false;
+    Value *Arg0 = II->getArgOperand(0);
+    Value *Arg1 = II->getArgOperand(1);
+    unsigned VWidth = Arg0->getType()->getVectorNumElements();
+    if (Value *V = SimplifyDemandedVectorEltsHigh(Arg0, VWidth, VWidth - 1)) {
+      II->setArgOperand(0, V);
+      MadeChange = true;
+    }
+    if (Value *V = SimplifyDemandedVectorEltsLow(Arg1, VWidth, 1)) {
+      II->setArgOperand(1, V);
+      MadeChange = true;
+    }
+    if (MadeChange)
----------------
RKSimon wrote:
> For the binary scalar intrinsics we need all the elements of the first input: the lowest is used in the operation and the remaining are all 'passed through' to the result.
Got it - it was the code comment that threw me off. Please change to something like:
"These intrinsics demand all elements of the first input because the high elements of that input are passed through. The low elements of both inputs are used for the actual binary op."

================
Comment at: test/Transforms/InstCombine/x86-sse.ll:134-136
@@ -136,5 +133,5 @@
 ; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x float> undef, float %a, i32 0
 ; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 1.000000e+00, i32 1
 ; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 2.000000e+00, i32 2
 ; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <4 x float> [[TMP3]], float 3.000000e+00, i32 3
 ; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <4 x float> undef, float %b, i32 0
----------------
The next patch will delete these extra inserts?

================
Comment at: test/Transforms/InstCombine/x86-sse2.ll:72-78
@@ -74,9 +71,9 @@
 ;
   %1 = insertelement <2 x double> undef, double %a, i32 0
   %2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
   %3 = insertelement <2 x double> undef, double %b, i32 0
   %4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
   %5 = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> %2, <2 x double> %4)
   %6 = extractelement <2 x double> %5, i32 1
   ret double %6
 }
----------------
This case is, "If a tree falls in the IEEE754 woods..." right? Ie, with fast-math, we zap the whole thing, but I don't think we can under default settings.

================
Comment at: test/Transforms/InstCombine/x86-sse2.ll:105-118
@@ -108,16 +104,16 @@
 
 define double @test_sub_sd_1(double %a, double %b) {
 ; CHECK-LABEL: @test_sub_sd_1(
-; CHECK-NEXT:    [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> <double undef, double 2.000000e+00>)
+; CHECK-NEXT:    [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> undef)
 ; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
   %1 = insertelement <2 x double> undef, double %a, i32 0
   %2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
   %3 = insertelement <2 x double> undef, double %b, i32 0
   %4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
   %5 = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> %2, <2 x double> %4)
   %6 = extractelement <2 x double> %5, i32 1
   ret double %6
 }
 
----------------
This isn't part of this patch of course, but this isn't always right, is it? See previous comment about default behavior under IEEE754. To maintain correct exception behavior, we should have subtracted %b from %a. But I suppose this is ok because Clang/LLVM don't support changing the FP env?

The patch that converts the intrinsic to regular IR will allow this to become "ret double 1.0"?


Repository:
  rL LLVM

http://reviews.llvm.org/D17490





More information about the llvm-commits mailing list