[PATCH] D109676: [HardwareLoops] put +1 for loop count before zero extension
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 25 02:22:57 PDT 2021
dmgreen added a comment.
Everything tends to be i32's in Arm, so we only rarely see extends in the induction variables in the Hardware loop pass. Even if the IV was previously in i8/i16 it will likely already have been extended to a i32 by indvarsimplify.
In D109676#3077185 <https://reviews.llvm.org/D109676#3077185>, @samparker wrote:
> I would like to start a top-level discussion so everything isn't lost within the review comments, and pull in the Arm people who still work on this... @dmgreen @samtebbs
>
> It appears as though PPC and Arm have different semantics for their loop backedge control. To summarize @shchenz, PPC will 'Decrement CTR and branch if it is still nonzero'. On Arm, AFAICT, we don't perform the decrement when the counter (LR) is <= 1.
>
> So, do we need to tighten the semantics of the loop intrinsics for their overflow behaviour? Arm and PPC use different intrinsics so I can't see why defining them slightly differently would anyone.
It's true we don't need to add 1 to the loop counter (if I'm understanding this correctly). When running our downstream benchmarks the only changes I saw from this patch is instructions that were in the preheader are now in the pre-preheader (the guard block). That makes things better or worse by ~1%, depending on the benchmark.
================
Comment at: llvm/test/Transforms/HardwareLoops/scalar-while.ll:311
; CHECK-PHIGUARD-NEXT: entry:
; CHECK-PHIGUARD-NEXT: [[CMP4:%.*]] = icmp slt i32 [[I:%.*]], [[N:%.*]]
; CHECK-PHIGUARD-NEXT: [[TMP0:%.*]] = add i32 [[I]], 1
----------------
shchenz wrote:
> samparker wrote:
> > shchenz wrote:
> > > The test changes is due to `isLoopEntryGuardedByCond` can not determine that SCEV `(1 + (-1 * %N) + %i)` is at least 1 before. But now, we assume that the TripCount is at least 1 if we enter the loop. And we also check that TripCount will not overflow when we set TripCount.
> > These are minor regressions, but they're inline with the other previous sub-optimal cases so I think it's okay.
> Thanks for confirmation.
It's only small, but it does mean we execute more instructions if the branch is taken..
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D109676/new/
https://reviews.llvm.org/D109676
More information about the llvm-commits
mailing list