[PATCH] D109676: [HardwareLoops] put +1 for loop count before zero extension

ChenZheng via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 13 19:20:33 PDT 2021


shchenz added a comment.

@reames Thanks for your review. Updated accordingly.



================
Comment at: llvm/lib/Analysis/TargetTransformInfo.cpp:170
     ExitBlock = BB;
-    ExitCount = EC;
+    TripCount = EC;
     break;
----------------
reames wrote:
> You're using a variable called TripCount to store an ExitCount (e.g. BTC) before adding the appropriate offset.  I strongly suggest using two variables.  
Good catch


================
Comment at: llvm/lib/Analysis/TargetTransformInfo.cpp:178
+  bool NeedExtend =
+      !TripCount->getType()->isPointerTy() && TripCount->getType() != CountType;
+  // If we are going to extend the TripCount and we can prove that TripCount + 1
----------------
reames wrote:
> reames wrote:
> > There appears to be an unasserted assumption that CountType is of a wider type than the exitcount's type.  The code structure does not make it clear this is true, at a minimum, please assert.  
> I'd suggest turning the pointer type check into an early return false unless you have a motivating case for the complexity.  Pointer typed trip counts should be rare.  
The code already make sure that `TripCount->getType()` not bigger than `CountType`. See line 124 ~ 125 in the same function `isHardwareLoopCandidate`. But yeah, an assertion will be more clear here.


================
Comment at: llvm/lib/Analysis/TargetTransformInfo.cpp:185
+          L, ICmpInst::ICMP_NE,
+          SE.getAddExpr(TripCount, SE.getOne(TripCount->getType())),
+          SE.getZero(TripCount->getType())))
----------------
reames wrote:
> Please use a comparison of the exit count against maximum unsigned value instead.  The optimizer will probably turn it into the same, but the form you've written has potential interaction with SCEVs flag inference which is best avoided.  
This was my first thought too. We can get the maximum num for `ExitCount` and if it is less than the maximum unsigned value for that type, we can safely do the +1. Unfortunately, for the case changes, they are all not such cases.

One case I was looking at is:
```
define dso_local signext i32 @testNestedPHI(i32 signext %cond, i32 signext %count, <512 x i1>* nocapture %ptr, <16 x i8> %vc) #1 {                     

for.body:                                         ; preds = %for.body.preheader, %for.body
  %i.011 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]              
  %vq.110 = phi <512 x i1> [ %1, %for.body ], [ %vq.0, %for.body.preheader ]    
  %1 = tail call <512 x i1> @llvm.ppc.mma.xvf32gernp(<512 x i1> %vq.110, <16 x i8> %vc, <16 x i8> %vc)
  %inc = add nuw nsw i32 %i.011, 1                                              
  %exitcond.not = icmp eq i32 %inc, %count                                      
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body                 
}
```

SCEV for the `ExitCount` is `%count - 1`, bacause there is no wrap flag, so `getRangeRef` will return full-set as the range for `ExitCount`.

For `ExitCount`, I think we should add `nuw` flag for the above SCEV expression? But that should be out of this patch's scope?


================
Comment at: llvm/lib/Analysis/TargetTransformInfo.cpp:193
+  else
+    TripCount = SE.getAddExpr(TripCount, SE.getOne(TripCount->getType()));
+
----------------
reames wrote:
> If you use getNoopOrZeroExtend, you don't need the explicit else case.
Good idea.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109676/new/

https://reviews.llvm.org/D109676



More information about the llvm-commits mailing list