[PATCH] D109676: [HardwareLoops] put +1 for loop count before zero extension

ChenZheng via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 16 23:59:58 PDT 2021


shchenz added inline comments.


================
Comment at: llvm/lib/Analysis/TargetTransformInfo.cpp:185
+          L, ICmpInst::ICMP_NE,
+          SE.getAddExpr(TripCount, SE.getOne(TripCount->getType())),
+          SE.getZero(TripCount->getType())))
----------------
reames wrote:
> shchenz wrote:
> > reames wrote:
> > > Please use a comparison of the exit count against maximum unsigned value instead.  The optimizer will probably turn it into the same, but the form you've written has potential interaction with SCEVs flag inference which is best avoided.  
> > This was my first thought too. We can get the maximum num for `ExitCount` and if it is less than the maximum unsigned value for that type, we can safely do the +1. Unfortunately, for the case changes, they are all not such cases.
> > 
> > One case I was looking at is:
> > ```
> > define dso_local signext i32 @testNestedPHI(i32 signext %cond, i32 signext %count, <512 x i1>* nocapture %ptr, <16 x i8> %vc) #1 {                     
> > 
> > for.body:                                         ; preds = %for.body.preheader, %for.body
> >   %i.011 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]              
> >   %vq.110 = phi <512 x i1> [ %1, %for.body ], [ %vq.0, %for.body.preheader ]    
> >   %1 = tail call <512 x i1> @llvm.ppc.mma.xvf32gernp(<512 x i1> %vq.110, <16 x i8> %vc, <16 x i8> %vc)
> >   %inc = add nuw nsw i32 %i.011, 1                                              
> >   %exitcond.not = icmp eq i32 %inc, %count                                      
> >   br i1 %exitcond.not, label %for.cond.cleanup, label %for.body                 
> > }
> > ```
> > 
> > SCEV for the `ExitCount` is `%count - 1`, bacause there is no wrap flag, so `getRangeRef` will return full-set as the range for `ExitCount`.
> > 
> > For `ExitCount`, I think we should add `nuw` flag for the above SCEV expression? But that should be out of this patch's scope?
> To make sure I'm following you, you're saying that we manage to prove ExitCount != 0 in this example?  That's interesting, I actually won't expect us to get that.  
> 
> I'll drop the point about not checking for zero.  Just be aware there is some risk there.  The good news is that we're rapidly reducing that risk via work on pr51817.  
Yes, the case is from the below test changes. Now we can put `+1` before the zero extend.


================
Comment at: llvm/lib/Analysis/TargetTransformInfo.cpp:179
+  if (ExitCount->getType()->isPointerTy()) {
+    TripCount = SE.getAddExpr(ExitCount, SE.getOne(CountType));
+    return true;
----------------
reames wrote:
> This is wrong.  Please return false here instead, pointer typed trip counts should never happen.
Thanks. I am not aware that loop exit count can not be a pointer type. I change this to assert instead of returning false.


================
Comment at: llvm/lib/CodeGen/HardwareLoops.cpp:187
     HardwareLoop(HardwareLoopInfo &Info, ScalarEvolution &SE,
-                 const DataLayout &DL,
-                 OptimizationRemarkEmitter *ORE) :
-      SE(SE), DL(DL), ORE(ORE), L(Info.L), M(L->getHeader()->getModule()),
-      ExitCount(Info.ExitCount),
-      CountType(Info.CountType),
-      ExitBranch(Info.ExitBranch),
-      LoopDecrement(Info.LoopDecrement),
-      UsePHICounter(Info.CounterInReg),
-      UseLoopGuard(Info.PerformEntryTest) { }
+                 const DataLayout &DL, OptimizationRemarkEmitter *ORE)
+        : SE(SE), DL(DL), ORE(ORE), L(Info.L), M(L->getHeader()->getModule()),
----------------
reames wrote:
> Please remove the white space diff.  If you want to reflow the code, do so in a separate change.  
I was doing this on purpose. Some Lint warnings are emitted for this patch after I rename `ExitCount` to `TripCount`. Should I fix the Lint warnings in this patch?


================
Comment at: llvm/lib/CodeGen/HardwareLoops.cpp:386
   // the loop header, so also check if it has a single predecessor.
-  if (SE.isLoopEntryGuardedByCond(L, ICmpInst::ICMP_NE, ExitCount,
-                                  SE.getZero(ExitCount->getType()))) {
+  if (SE.isLoopEntryGuardedByCond(L, ICmpInst::ICMP_NE, TripCount,
+                                  SE.getZero(TripCount->getType()))) {
----------------
reames wrote:
> Hm, hadn't noticed this before.
> 
> An actual trip count is never zero.  This appears to have been trying to handle the overflow in BTC to TC conversion.  I haven't looked at this to see what it's doing and why, but this is likely dead code now and you should understand why this was insufficient previously.  
I don't think it is used to detect overflow in the original patch. It should be used to detect the possibility that the loop count can be zero. If it can not be a zero(for example, explicitly check the loop count and constant zero in the IR), some targets like ARM can use 'test and set' form loop count intrinsic, so that we can delete the explicitly check instruction in the IR. See one example in `test/Transforms/HardwareLoops/loop-guards.ll` `@test6()`. After the change, instruction `%cmp = icmp ne i32 %N, 0` has no user.

For the previous bug in https://reviews.llvm.org/D91724, this check does not help. On PowerPC, there is no `test and set` form loop count intrinsic, so no matter what the result of `isLoopEntryGuardedByCond` is, `UseLoopGuard` is always false. And the loop count value may still be changed from umax_32 + 1(if we extend to u64) to 0 after D91724.

At last, yes, after this patch, this check should not be done two times. I have updated the patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109676/new/

https://reviews.llvm.org/D109676



More information about the llvm-commits mailing list