[PATCH] D61160: [SimplifyCFG] Implement the suggested ctlz transform

Fri Apr 26 01:44:26 PDT 2019

jmolloy requested changes to this revision.
jmolloy added a comment.
This revision now requires changes to proceed.

Thanks for implementing this! It's really cool.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5492

+/// Another cheap transform: investigate using ctlz as a key to the switch
+/// TODO: this transform would benifit from proper range analysis in SwitchToLookupTable
----------------
s/Another/A. "Another" requires the user to know some context which has now gone.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5493
+/// Another cheap transform: investigate using ctlz as a key to the switch
+/// TODO: this transform would benifit from proper range analysis in SwitchToLookupTable
+static bool ReduceSwitchRange_ctlz(SwitchInst *SI,
----------------
TODO -> FIXME

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5494
+/// TODO: this transform would benifit from proper range analysis in SwitchToLookupTable
+static bool ReduceSwitchRange_ctlz(SwitchInst *SI,
+                              const DataLayout &DL,
----------------
No underscores in function names. ReduceSwitchRangeWithCtlz?

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5495
+static bool ReduceSwitchRange_ctlz(SwitchInst *SI,
+                              const DataLayout &DL,
+                              const TargetTransformInfo &TTI,
----------------
It looks to me like this isn't clang-formatted. Can you git-clang-format all your diffs?

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5496
+                              const DataLayout &DL,
+                              const TargetTransformInfo &TTI,
+                              std::vector<uint64_t> *Values) {
----------------
You don't need DL or TTI here, please remove them.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5497
+                              const TargetTransformInfo &TTI,
+                              std::vector<uint64_t> *Values) {
+  // There is a big gotcha reagarding ctlz/cttz: when input 0, they
----------------
I'd use a reference here. It's what most of the rest of LLVM uses.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5498
+                              std::vector<uint64_t> *Values) {
+  // There is a big gotcha reagarding ctlz/cttz: when input 0, they
+  // emit bit width. If not handled this can result in the transform
----------------
s/reagarding/regarding.

"when given input 0, they return the bitwidth" would be a more natural phrasing.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5500
+  // emit bit width. If not handled this can result in the transform
+  // being applied multiple times. We can either rotate bit width back to zero,
+  // by doing an xor or the most significant bit, followed by a rotate-left 1,
----------------
I would avoid talking about the solution you didn't implement. Checking for zero input (which your popcount does) is a totally reasonable and understandable thing (and much more understandable than trying to grok the xor/rotate explanation to then find it isn't used :D )

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5503
+  // which is expensive, or we can just avoid situation, by checking the range.
+  unsigned BitWidth = cast<IntegerType>(SI->getCondition()->getType())->getIntegerBitWidth();
+  if (Values->back() <= BitWidth)
----------------
getIntegerBitWidth() is defined on Type. You can avoid the cast.

  unsigned BitWidth = SI->getCondition()->getType()->getIntegerBitWidth()

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5515
+    return false;
+  // TODO a single instruction can be saved on x86 by using cttz over ctlz
+  // when only ctlz results in a subtraction instruction being added
----------------
This would be a pattern for instcombine to pick up. We shouldn't have to worry about this here.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5519
+    V = APInt(BitWidth, V).countLeadingZeros();
+  // 0 will suddenly become the largest (BitWidth), so we need to sort again.
+  // Sorting is OK because we either make a transform on all numbers, or do
----------------
Isn't the order exactly reversed?

  llvm::reverse(Values);

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5520
+  // 0 will suddenly become the largest (BitWidth), so we need to sort again.
+  // Sorting is OK because we either make a transform on all numbers, or do
+  // not make a transform at all.
----------------
And if you do need to sort, please use 

  llvm::sort(*Values)

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5553
   // We organize the range to start from 0, if it is not already close.
-  SmallVector<uint64_t,4> Values;
+  std::vector<uint64_t> Values;
   for (auto &C : SI->cases())
----------------
Please keep using SmallVector.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5556
     Values.push_back(C.getCaseValue()->getValue().getLimitedValue());
-  llvm::sort(Values);
+  std::sort(Values.begin(), Values.end());

----------------
Please keep using llvm::sort.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5620
   auto *Ty = cast<IntegerType>(SI->getCondition()->getType());
-  Builder.SetInsertPoint(SI);
-  auto *Sub = Builder.CreateSub(SI->getCondition(), ConstantInt::get(Ty, Base));
-  Value *Key;
-  if (Shift) {
-    auto *ShiftC = ConstantInt::get(Ty, Shift);
-    Function *Fshr = Intrinsic::getDeclaration(SI->getModule(), Intrinsic::fshr, Ty);
-    Key = Builder.CreateCall(Fshr, {Sub, Sub, ShiftC});
-  } else
-    Key = Sub;
-  SI->replaceUsesOfWith(SI->getCondition(), Key);
+  {
+    Builder.SetInsertPoint(SI);
----------------
Please remove the extra scope. There's no RAII objects here, so they aren't required.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5624
+    if (UseCtlz) {
+      auto *Zero = ConstantInt::get(IntegerType::get(Ty->getContext(), 1), 0);
+      Function *Ctlz = Intrinsic::getDeclaration(SI->getModule(), Intrinsic::ctlz, Ty);
----------------
> IntegerType::get(Ty->getContext(), 1) 

Change to

  Builder.getInt1(false);

And please change "Zero" to a more descriptive name for the second parameter to ctlz.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5632
+      auto *ShiftC = ConstantInt::get(Ty, Shift);
+      Function *Fshr = Intrinsic::getDeclaration(SI->getModule(), Intrinsic::fshr, Ty);
+      Key = Builder.CreateCall(Fshr, {Key, Key, ShiftC});
----------------
See comments on diffbase about fshr. I don't think you need it, the rotate sequence is good enough. Is there a good reason why you think fshr is better?

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:5645
+      Zeros = Orig->getValue().getLimitedValue();
+    auto Sub = (APInt(BitWidth, Zeros) - Base).getLimitedValue();
     Case.setValue(
----------------
Why are you converting from APInt to uint64_t to APInt again? Just remove .getLimitedValue() and keep Sub.lshr() in the next line?

================
Comment at: test/Transforms/SimplifyCFG/rangereduce.ll:85
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub i32 [[A:%.*]], 97
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fshr.i32(i32 [[TMP1]], i32 [[TMP1]], i32 0)
-; CHECK-NEXT:    switch i32 [[TMP2]], label [[DEF:%.*]] [
----------------
Uh, something's fishy here - why was this fshr here in the first place? This switch just needs a subtract.

================
Comment at: test/Transforms/SimplifyCFG/rangereduce.ll:121
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub i32 [[A:%.*]], 97
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fshr.i32(i32 [[TMP1]], i32 [[TMP1]], i32 0)
-; CHECK-NEXT:    switch i32 [[TMP2]], label [[DEF:%.*]] [
----------------
Your previous patch is clearly bogus; this is a shift by zero.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61160/new/

https://reviews.llvm.org/D61160