[PATCH] D109821: [ScalarEvolution] Infer loop max trip count from array accesses

Wed Oct 20 05:00:23 PDT 2021

Peakulorain added inline comments.

================
Comment at: llvm/lib/Analysis/ScalarEvolution.cpp:7353
+      // Make sure only handle normal array.
+      auto *Ty = dyn_cast<ArrayType>(AllocateInst->getAllocatedType());
+      auto *ArrSize = dyn_cast<ConstantInt>(AllocateInst->getArraySize());
----------------
xbolva00 wrote:
> reames wrote:
> > FYI, we can do better here by using getAllocationSizeInBits, but please *don't* change this in the current patch.  We're converging towards a LGTM, and minimal change is good.  It would make a good follow up though.
> getAllocationSizeInBits will not probably catch this case, or?
> 
> ```
> int foo(int data[static 6]) { // define dso_local i32 @foo(i32* nocapture readonly align 4 dereferenceable(24) %0) 
>   int p = 0;
>   int i = 0;
>   while (data[i++] != -1) // currently GCC does not unroll this loop 6x
>      p += data[i];
>   return p;
> }
> 
> ```
> 
> or when data is allocated using malloc/calloc/..
> getAllocationSizeInBits will not probably catch this case, or?
> 
> ```
> int foo(int data[static 6]) { // define dso_local i32 @foo(i32* nocapture readonly align 4 dereferenceable(24) %0) 
>   int p = 0;
>   int i = 0;
>   while (data[i++] != -1) // currently GCC does not unroll this loop 6x
>      p += data[i];
>   return p;
> }
> 
> ```
> 
> or when data is allocated using malloc/calloc/..

Yeah, I think you are right, so we only do this infer on static allocated memory.

================
Comment at: llvm/lib/Analysis/ScalarEvolution.cpp:7353
+      // Make sure only handle normal array.
+      auto *Ty = dyn_cast<ArrayType>(AllocateInst->getAllocatedType());
+      auto *ArrSize = dyn_cast<ConstantInt>(AllocateInst->getArraySize());
----------------
Peakulorain wrote:
> xbolva00 wrote:
> > reames wrote:
> > > FYI, we can do better here by using getAllocationSizeInBits, but please *don't* change this in the current patch.  We're converging towards a LGTM, and minimal change is good.  It would make a good follow up though.
> > getAllocationSizeInBits will not probably catch this case, or?
> > 
> > ```
> > int foo(int data[static 6]) { // define dso_local i32 @foo(i32* nocapture readonly align 4 dereferenceable(24) %0) 
> >   int p = 0;
> >   int i = 0;
> >   while (data[i++] != -1) // currently GCC does not unroll this loop 6x
> >      p += data[i];
> >   return p;
> > }
> > 
> > ```
> > 
> > or when data is allocated using malloc/calloc/..
> > getAllocationSizeInBits will not probably catch this case, or?
> > 
> > ```
> > int foo(int data[static 6]) { // define dso_local i32 @foo(i32* nocapture readonly align 4 dereferenceable(24) %0) 
> >   int p = 0;
> >   int i = 0;
> >   while (data[i++] != -1) // currently GCC does not unroll this loop 6x
> >      p += data[i];
> >   return p;
> > }
> > 
> > ```
> > 
> > or when data is allocated using malloc/calloc/..
> 
> Yeah, I think you are right, so we only do this infer on static allocated memory.
> FYI, we can do better here by using getAllocationSizeInBits, but please *don't* change this in the current patch.  We're converging towards a LGTM, and minimal change is good.  It would make a good follow up though.

Ok, Let's do it incrementally.

================
Comment at: llvm/lib/Analysis/ScalarEvolution.cpp:7365
+      const SCEV *MemSize =
+          getConstant(Step->getType(), DL.getTypeAllocSize(Ty));
+      auto *MaxExeCount = dyn_cast<SCEVConstant>(getUDivExpr(MemSize, Step));
----------------
reames wrote:
> Ok, there's a subtle bug here.
> 
> Since gep indices are silently zext to the indexing type, you can have a narrow gep index which wraps around rather than strictly increasing.  You can fix this by ensuring that the addrec you checked for above has the same type as the gep.  
Can we use "hasNoSelfWrap()" to ensure that? I used this API above.

================
Comment at: llvm/lib/Analysis/ScalarEvolution.cpp:7370
+
+      // If the loop reaches the maximum number of executions, and UB will
+      // definitely occur in the next iteration, it is allowed to enter loop
----------------
reames wrote:
> Ok, you have two different bugs here.
> 
> First, a *partially* out of bounds access is allowed without being UB.  (Or at least, that seemed to be the conclusion we'd come to earlier.)  Since the above udiv produces a floor (not a ceiling), you have the possibility of one extra iteration being well defined with an access that starts in bounds and finishes out of bounds.  Easiest fix would be to the use getUDivCeilSCEV in the line above.
> 
> Second, in the case where element size is 1 (i.e. a byte array), you're udiv above can produce uint_max for the respective type.  As such, the increment below can wrap, and you need to use a wider type.
> Ok, you have two different bugs here.
> 
> First, a *partially* out of bounds access is allowed without being UB.  (Or at least, that seemed to be the conclusion we'd come to earlier.)  Since the above udiv produces a floor (not a ceiling), you have the possibility of one extra iteration being well defined with an access that starts in bounds and finishes out of bounds.  Easiest fix would be to the use getUDivCeilSCEV in the line above.
> 
Thanks for your remind, fixed.

> Second, in the case where element size is 1 (i.e. a byte array), you're udiv above can produce uint_max for the respective type.  As such, the increment below can wrap, and you need to use a wider type.
We discard such an extremum value.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109821/new/

https://reviews.llvm.org/D109821