[PATCH] D109676: [HardwareLoops] put +1 for loop count before zero extension

ChenZheng via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 15 07:50:19 PDT 2021


shchenz added inline comments.


================
Comment at: llvm/lib/Analysis/TargetTransformInfo.cpp:188
+  // as not overflow. Hareware count register will handle the count 0 case well.
+  // For example, on PowerPC, if the value in the Count Register is 0 before
+  // being decremented, it is -1 afterward.
----------------
samparker wrote:
> This comment is making me uneasy... I'm don't think target specific stuff should be referenced in this generic layer. But either way, why would a count register ever have a value of -1, especially when we're expending quite some effort to test against zero in this transform?
PowerPC don't have test and set form. On PowerPC, if the count register is 0 before being decremented, it will be wrapped to -1(all bits are 1) afterward, so we can handle the overflow case when `ExitCount` type is same with `CountType`. The total loop count is u64_MAX + 1

But on other targets like ARM, the hardware loops pass handles the overflow case for `ExitCount = SE.getAddExpr(ExitCount, SE.getOne(CountType));` when `ExitCount` type is same with `CountType` with test and set form? If this is the case, yes, the comments is not right, we need to update it for different targets.

But for all targets, making ` SE.getAddExpr(ExitCount, SE.getOne(CountType));` as not overflow when `SE.getTypeSizeInBits(ExitCount->getType()) == CountType->getBitWidth()` should be OK. Right?


================
Comment at: llvm/lib/CodeGen/HardwareLoops.cpp:417
   // different. It means there will be instruction(s) in a block that possibly
   // aren't needed. The isLoopEntryGuardedByCond is trying to avoid this issue,
   // but it's doesn't appear to work in all cases.
----------------
samparker wrote:
> Let's now remove this comment about isLoopEntryGuardedByCond.
Thanks, will do this later.


================
Comment at: llvm/test/Transforms/HardwareLoops/scalar-while.ll:311
 ; CHECK-PHIGUARD-NEXT:  entry:
 ; CHECK-PHIGUARD-NEXT:    [[CMP4:%.*]] = icmp slt i32 [[I:%.*]], [[N:%.*]]
 ; CHECK-PHIGUARD-NEXT:    [[TMP0:%.*]] = add i32 [[I]], 1
----------------
samparker wrote:
> shchenz wrote:
> > The test changes is due to `isLoopEntryGuardedByCond` can not determine that SCEV `(1 + (-1 * %N) + %i)` is at least 1 before. But now, we assume that the TripCount is at least 1 if we enter the loop. And we also check that TripCount will not overflow when we set TripCount.
> These are minor regressions, but they're inline with the other previous sub-optimal cases so I think it's okay.
Thanks for confirmation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109676/new/

https://reviews.llvm.org/D109676



More information about the llvm-commits mailing list