[PATCH] D109676: [HardwareLoops] put +1 for loop count before zero extension
ChenZheng via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 15 07:50:19 PDT 2021
shchenz added inline comments.
================
Comment at: llvm/lib/Analysis/TargetTransformInfo.cpp:188
+ // as not overflow. Hareware count register will handle the count 0 case well.
+ // For example, on PowerPC, if the value in the Count Register is 0 before
+ // being decremented, it is -1 afterward.
----------------
samparker wrote:
> This comment is making me uneasy... I'm don't think target specific stuff should be referenced in this generic layer. But either way, why would a count register ever have a value of -1, especially when we're expending quite some effort to test against zero in this transform?
PowerPC don't have test and set form. On PowerPC, if the count register is 0 before being decremented, it will be wrapped to -1(all bits are 1) afterward, so we can handle the overflow case when `ExitCount` type is same with `CountType`. The total loop count is u64_MAX + 1
But on other targets like ARM, the hardware loops pass handles the overflow case for `ExitCount = SE.getAddExpr(ExitCount, SE.getOne(CountType));` when `ExitCount` type is same with `CountType` with test and set form? If this is the case, yes, the comments is not right, we need to update it for different targets.
But for all targets, making ` SE.getAddExpr(ExitCount, SE.getOne(CountType));` as not overflow when `SE.getTypeSizeInBits(ExitCount->getType()) == CountType->getBitWidth()` should be OK. Right?
================
Comment at: llvm/lib/CodeGen/HardwareLoops.cpp:417
// different. It means there will be instruction(s) in a block that possibly
// aren't needed. The isLoopEntryGuardedByCond is trying to avoid this issue,
// but it's doesn't appear to work in all cases.
----------------
samparker wrote:
> Let's now remove this comment about isLoopEntryGuardedByCond.
Thanks, will do this later.
================
Comment at: llvm/test/Transforms/HardwareLoops/scalar-while.ll:311
; CHECK-PHIGUARD-NEXT: entry:
; CHECK-PHIGUARD-NEXT: [[CMP4:%.*]] = icmp slt i32 [[I:%.*]], [[N:%.*]]
; CHECK-PHIGUARD-NEXT: [[TMP0:%.*]] = add i32 [[I]], 1
----------------
samparker wrote:
> shchenz wrote:
> > The test changes is due to `isLoopEntryGuardedByCond` can not determine that SCEV `(1 + (-1 * %N) + %i)` is at least 1 before. But now, we assume that the TripCount is at least 1 if we enter the loop. And we also check that TripCount will not overflow when we set TripCount.
> These are minor regressions, but they're inline with the other previous sub-optimal cases so I think it's okay.
Thanks for confirmation.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D109676/new/
https://reviews.llvm.org/D109676
More information about the llvm-commits
mailing list