[PATCH] D109676: [HardwareLoops] put +1 for loop count before zero extension

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 25 02:22:57 PDT 2021


dmgreen added a comment.

Everything tends to be i32's in Arm, so we only rarely see extends in the induction variables in the Hardware loop pass. Even if the IV was previously in i8/i16 it will likely already have been extended to a i32 by indvarsimplify.

In D109676#3077185 <https://reviews.llvm.org/D109676#3077185>, @samparker wrote:

> I would like to start a top-level discussion so everything isn't lost within the review comments, and pull in the Arm people who still work on this... @dmgreen @samtebbs
>
> It appears as though PPC and Arm have different semantics for their loop backedge control. To summarize @shchenz, PPC will 'Decrement CTR and branch if it is still nonzero'. On Arm, AFAICT, we don't perform the decrement when the counter (LR) is <= 1.
>
> So, do we need to tighten the semantics of the loop intrinsics for their overflow behaviour? Arm and PPC use different intrinsics so I can't see why defining them slightly differently would anyone.

It's true we don't need to add 1 to the loop counter (if I'm understanding this correctly). When running our downstream benchmarks the only changes I saw from this patch is instructions that were in the preheader are now in the pre-preheader (the guard block). That makes things better or worse by ~1%, depending on the benchmark.



================
Comment at: llvm/test/Transforms/HardwareLoops/scalar-while.ll:311
 ; CHECK-PHIGUARD-NEXT:  entry:
 ; CHECK-PHIGUARD-NEXT:    [[CMP4:%.*]] = icmp slt i32 [[I:%.*]], [[N:%.*]]
 ; CHECK-PHIGUARD-NEXT:    [[TMP0:%.*]] = add i32 [[I]], 1
----------------
shchenz wrote:
> samparker wrote:
> > shchenz wrote:
> > > The test changes is due to `isLoopEntryGuardedByCond` can not determine that SCEV `(1 + (-1 * %N) + %i)` is at least 1 before. But now, we assume that the TripCount is at least 1 if we enter the loop. And we also check that TripCount will not overflow when we set TripCount.
> > These are minor regressions, but they're inline with the other previous sub-optimal cases so I think it's okay.
> Thanks for confirmation.
It's only small, but it does mean we execute more instructions if the branch is taken..


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109676/new/

https://reviews.llvm.org/D109676



More information about the llvm-commits mailing list