[PATCH] D48067: [InstCombine] Replacing X86-specific rounding intrinsics with generic floor-ceil

Craig Topper via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 12 10:27:28 PDT 2018


craig.topper added inline comments.


================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:582
+                               InstCombiner::BuilderTy &Builder) {
+  int RoundControl;
+  Intrinsic::ID IntrinsicID = II.getIntrinsicID();
----------------
Why int and getSExtValue? Feels like it should be unsigned and getZExtValue.


================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:586
+      IntrinsicID == Intrinsic::x86_sse41_round_sd)
+    RoundControl = cast<ConstantInt>(II.getArgOperand(2))->getSExtValue();
+  else if (IntrinsicID == Intrinsic::x86_avx512_mask_rndscale_ss ||
----------------
Not sure if we should assume the rounding mode or SAE is a constant int. The clang frontend guarantees it, but handcrafted IR tests could break it.


================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:619
+      Value *Zero = Constant::getNullValue(I32Ty);
+      Mask = (MaskTy == I32Ty) ? II.getArgOperand(3)
+                               : Builder.CreateZExt(II.getArgOperand(3), I32Ty);
----------------
Why would MaskTy vary here? It's fixed by the intrinsic isn't it?


================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:621
+                               : Builder.CreateZExt(II.getArgOperand(3), I32Ty);
+      Mask = Builder.CreateAnd(Mask, One);
+      Mask = Builder.CreateICmp(ICmpInst::ICMP_NE, Mask, Zero);
----------------
There's an overload of CreateAnd that accepts a uint64_t for RHS, so you can probably just pass 1 here without creating the ConstantInt yourself.


================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:652
+    if (Width < 8) {
+      auto *MaskIntTy = Builder.getIntNTy(Width);
+      Mask = Builder.CreateTrunc(Mask, MaskIntTy);
----------------
For less than 8, you should cast i8 to v8i1 first then extract the subvector. We don't want i4 and i2 types floating around.


================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:2575
+      }
+      V = simplifyX86round(*II, Builder);
+      if (V)
----------------
Why are we calling simplifyX86Round nested under SimplifyDemandedVectorElts? Simplifying should be completely orthogonal. If a simplification happened and the round intrinsic still exists, InstCombine will revisit here and SimplifyDemandedVectorElts will return nullptr. No need to try to handle every case in one shot.


Repository:
  rL LLVM

https://reviews.llvm.org/D48067





More information about the llvm-commits mailing list