[PATCH] D63477: [PowerPC] exclude ICmpZero Use in LSR if icmp can be replaced inside hardware loop.

Hal Finkel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 25 19:17:09 PDT 2019


hfinkel added inline comments.


================
Comment at: llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp:893
+
+  // Bail out if the loop has irreducible control flow.
+  LoopBlocksRPO RPOT(L);
----------------
This is duplicating applicibility logic from the HardwareLoops pass? Can we move this into a utility function, maybe something like HardwareLoopInfo::canAnalyze(Loop *L)?


================
Comment at: llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp:3271
+        // in PowerPC, no need to generate initial formulae for it.
+        bool saveCmp = false;
+        if (!ExitBranch)
----------------
saveCmp -> SaveCmp


================
Comment at: llvm/test/CodeGen/PowerPC/negctr.ll:38
 
+; FIXME: This should be a hardware loop.
 ; CHECK: @main1
----------------
shchenz wrote:
> This is a known deg. With this patch, 
> Before `Codegen Prepare` Pass:
> ```
> for.body:                                         ; preds = %for.body.preheader, %for.body
>   %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 1, %for.body.preheader ]
>   %indvars.iv.next = add i64 %indvars.iv, 1
>   %exitcond = icmp eq i64 %indvars.iv.next, 0
>   br i1 %exitcond, label %for.end.loopexit, label %for.body
> ```
> After  `Codegen Prepare` Pass:
> ```
> for.body:                                         ; preds = %for.body.preheader, %for.body
>   %indvars.iv = phi i64 [ %math, %for.body ], [ 1, %for.body.preheader ]
>   %0 = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %indvars.iv, i64 1)
>   %math = extractvalue { i64, i1 } %0, 0
>   %ov = extractvalue { i64, i1 } %0, 1
>   br i1 %ov, label %for.end, label %for.body
> ```
> 
> `HardwareLoops` Pass currently can not recognize `uadd Intrinsic` + `extractvalue` as a hardware loop(can not get loop exit count). Maybe we need to modify the logic in `Codegen Prepare` Pass: if the loop is a hardware loop, we should not optimize the cmp to uadd/usub intrinsic?
Yes, or teach SCEV about the uadd/usub pattern? I lean toward making SCEV smarter.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63477/new/

https://reviews.llvm.org/D63477





More information about the llvm-commits mailing list