[PATCH] D155049: [ScalarEvolution] Infer loop max trip count from memory accesses

Thu Jul 20 01:57:54 PDT 2023

Peakulorain added a comment.

In D155049#4517718 <https://reviews.llvm.org/D155049#4517718>, @nikic wrote:

> In D155049#4516983 <https://reviews.llvm.org/D155049#4516983>, @Peakulorain wrote:
>
>> In D155049#4515837 <https://reviews.llvm.org/D155049#4515837>, @nikic wrote:
>>
>>> In D155049#4514907 <https://reviews.llvm.org/D155049#4514907>, @Peakulorain wrote:
>>>
>>>> Thanks for your help, the above case is indeed filtered out by constraints. But please see :
>>>>
>>>>   define void @test(i32 signext %len) {...
>>>>   for.body:
>>>>     %iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
>>>>     %idxprom = zext i8 %iv to i64
>>>>     %arrayidx = getelementptr inbounds [500 x i32], [500 x i32]* %a, i64 0, i64 %idxprom
>>>>     store i32 0, i32* %arrayidx, align 4
>>>>     %inc = add nuw nsw i8 %iv, 1
>>>>     %inc_zext = zext i8 %inc to i32
>>>>     %cmp = icmp slt i32 %inc_zext, %len
>>>>     br i1 %cmp, label %for.body, label %loopexit
>>>>     ...
>>>>   }
>>>>
>>>> this case would get **{%a,+,4}<nuw><%for.body>**. In such a situation, I think it is necessary to calculate how many iterations to wrap. :)
>>>
>>> Doesn't the `add nuw` exclude wrapping in this case though? This is why SCEV concludes it's okay to look through the zext.
>>
>> I know that `nuw` flag has excluded wrapping.  Even so, on this basis, the BE we get by **(MemSize / StepSize + 1)** is **501**, I'm concerned that this inferred value is not available, so I did a comparison with i8 wrap value. If the inferred value is within the loop iterator wrap value,  then we consider it available.
>
> In this example, isn't it okay if the max trip count is reported as 501? We actually know that the max trip count must be 256 due to the nuw on the add, even if SCEV fails to realize it. As such, reporting 501 as a conservative over-estimate should be fine. Am I missing something?

Okia, looks like I'm worrying too much. Then the new implementation is divided into the following steps:

1. Collect the load/store instructions that are executed on each iteration of the loop;
2. Filter out Reads/Write that may overlap;
3. Infer the maximum number of executables from the total memory size/step value;

Please see if there are any other issues for this patch. :)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155049/new/

https://reviews.llvm.org/D155049