[PATCH] D109676: [HardwareLoops] put +1 for loop count before zero extension

Philip Reames via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 16 09:48:02 PDT 2021


reames requested changes to this revision.
reames added inline comments.
This revision now requires changes to proceed.


================
Comment at: llvm/lib/Analysis/TargetTransformInfo.cpp:185
+          L, ICmpInst::ICMP_NE,
+          SE.getAddExpr(TripCount, SE.getOne(TripCount->getType())),
+          SE.getZero(TripCount->getType())))
----------------
shchenz wrote:
> reames wrote:
> > Please use a comparison of the exit count against maximum unsigned value instead.  The optimizer will probably turn it into the same, but the form you've written has potential interaction with SCEVs flag inference which is best avoided.  
> This was my first thought too. We can get the maximum num for `ExitCount` and if it is less than the maximum unsigned value for that type, we can safely do the +1. Unfortunately, for the case changes, they are all not such cases.
> 
> One case I was looking at is:
> ```
> define dso_local signext i32 @testNestedPHI(i32 signext %cond, i32 signext %count, <512 x i1>* nocapture %ptr, <16 x i8> %vc) #1 {                     
> 
> for.body:                                         ; preds = %for.body.preheader, %for.body
>   %i.011 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]              
>   %vq.110 = phi <512 x i1> [ %1, %for.body ], [ %vq.0, %for.body.preheader ]    
>   %1 = tail call <512 x i1> @llvm.ppc.mma.xvf32gernp(<512 x i1> %vq.110, <16 x i8> %vc, <16 x i8> %vc)
>   %inc = add nuw nsw i32 %i.011, 1                                              
>   %exitcond.not = icmp eq i32 %inc, %count                                      
>   br i1 %exitcond.not, label %for.cond.cleanup, label %for.body                 
> }
> ```
> 
> SCEV for the `ExitCount` is `%count - 1`, bacause there is no wrap flag, so `getRangeRef` will return full-set as the range for `ExitCount`.
> 
> For `ExitCount`, I think we should add `nuw` flag for the above SCEV expression? But that should be out of this patch's scope?
To make sure I'm following you, you're saying that we manage to prove ExitCount != 0 in this example?  That's interesting, I actually won't expect us to get that.  

I'll drop the point about not checking for zero.  Just be aware there is some risk there.  The good news is that we're rapidly reducing that risk via work on pr51817.  


================
Comment at: llvm/lib/Analysis/TargetTransformInfo.cpp:179
+  if (ExitCount->getType()->isPointerTy()) {
+    TripCount = SE.getAddExpr(ExitCount, SE.getOne(CountType));
+    return true;
----------------
This is wrong.  Please return false here instead, pointer typed trip counts should never happen.


================
Comment at: llvm/lib/CodeGen/HardwareLoops.cpp:187
     HardwareLoop(HardwareLoopInfo &Info, ScalarEvolution &SE,
-                 const DataLayout &DL,
-                 OptimizationRemarkEmitter *ORE) :
-      SE(SE), DL(DL), ORE(ORE), L(Info.L), M(L->getHeader()->getModule()),
-      ExitCount(Info.ExitCount),
-      CountType(Info.CountType),
-      ExitBranch(Info.ExitBranch),
-      LoopDecrement(Info.LoopDecrement),
-      UsePHICounter(Info.CounterInReg),
-      UseLoopGuard(Info.PerformEntryTest) { }
+                 const DataLayout &DL, OptimizationRemarkEmitter *ORE)
+        : SE(SE), DL(DL), ORE(ORE), L(Info.L), M(L->getHeader()->getModule()),
----------------
Please remove the white space diff.  If you want to reflow the code, do so in a separate change.  


================
Comment at: llvm/lib/CodeGen/HardwareLoops.cpp:200
     OptimizationRemarkEmitter *ORE = nullptr;
-    Loop *L                 = nullptr;
-    Module *M               = nullptr;
-    const SCEV *ExitCount   = nullptr;
-    Type *CountType         = nullptr;
-    BranchInst *ExitBranch  = nullptr;
-    Value *LoopDecrement    = nullptr;
-    bool UsePHICounter      = false;
-    bool UseLoopGuard       = false;
-    BasicBlock *BeginBB     = nullptr;
+    Loop *L = nullptr;
+    Module *M = nullptr;
----------------
Same.


================
Comment at: llvm/lib/CodeGen/HardwareLoops.cpp:386
   // the loop header, so also check if it has a single predecessor.
-  if (SE.isLoopEntryGuardedByCond(L, ICmpInst::ICMP_NE, ExitCount,
-                                  SE.getZero(ExitCount->getType()))) {
+  if (SE.isLoopEntryGuardedByCond(L, ICmpInst::ICMP_NE, TripCount,
+                                  SE.getZero(TripCount->getType()))) {
----------------
Hm, hadn't noticed this before.

An actual trip count is never zero.  This appears to have been trying to handle the overflow in BTC to TC conversion.  I haven't looked at this to see what it's doing and why, but this is likely dead code now and you should understand why this was insufficient previously.  


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109676/new/

https://reviews.llvm.org/D109676



More information about the llvm-commits mailing list