[PATCH] D109676: [HardwareLoops] put +1 for loop count before zero extension

ChenZheng via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 25 06:49:10 PDT 2021


shchenz marked an inline comment as done.
shchenz added a comment.

In D109676#3083540 <https://reviews.llvm.org/D109676#3083540>, @dmgreen wrote:

> Everything tends to be i32's in Arm, so we only rarely see extends in the induction variables in the Hardware loop pass. Even if the IV was previously in i8/i16 it will likely already have been extended to a i32 by indvarsimplify.

Thanks, that is the possible reason why we don't see any regressions after patch D91724 <https://reviews.llvm.org/D91724> on ARM.

> It's true we don't need to add 1 to the loop counter (if I'm understanding this correctly). When running our downstream benchmarks the only changes I saw from this patch is instructions that were in the preheader are now in the pre-preheader (the guard block). That makes things better or worse by ~1%, depending on the benchmark.

Thanks for your test on ARM. The performance impact should be small as it only affects the loop count expansion outside of the loop.

So for this patch, we need to wait for ARM experts find out the solution for overflow issue when `ExitCount` type is same with `CountType`?



================
Comment at: llvm/test/Transforms/HardwareLoops/scalar-while.ll:311
 ; CHECK-PHIGUARD-NEXT:  entry:
 ; CHECK-PHIGUARD-NEXT:    [[CMP4:%.*]] = icmp slt i32 [[I:%.*]], [[N:%.*]]
 ; CHECK-PHIGUARD-NEXT:    [[TMP0:%.*]] = add i32 [[I]], 1
----------------
dmgreen wrote:
> shchenz wrote:
> > samparker wrote:
> > > shchenz wrote:
> > > > The test changes is due to `isLoopEntryGuardedByCond` can not determine that SCEV `(1 + (-1 * %N) + %i)` is at least 1 before. But now, we assume that the TripCount is at least 1 if we enter the loop. And we also check that TripCount will not overflow when we set TripCount.
> > > These are minor regressions, but they're inline with the other previous sub-optimal cases so I think it's okay.
> > Thanks for confirmation.
> It's only small, but it does mean we execute more instructions if the branch is taken..
Putting the loop count in pre-preheader is required by `UseLoopGuard = true`? If so, this should be positive, because now we use more loop guards?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109676/new/

https://reviews.llvm.org/D109676



More information about the llvm-commits mailing list