[PATCH] D95363: [SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse

Fri Mar 5 01:36:27 PST 2021

CarolineConcatto marked an inline comment as done.
CarolineConcatto added a comment.

Thank you all for the review.
I C&P the function getRunTime from D95139 <https://reviews.llvm.org/D95139>.
If D95139 <https://reviews.llvm.org/D95139>  is merged before this patch I will remove it.
All the other dependencies are already in main, and getRunTimeVF  is the only function that this patch needs from D95139 <https://reviews.llvm.org/D95139>.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2807
+        PartPtr->setIsInBounds(InBounds);
+      } else {
+        PartPtr = cast<GetElementPtrInst>(
----------------
david-arm wrote:
> bin.cheng-ali wrote:
> > Sorry for one nitpicking.
> > IIUC the difference for scalable/fixed cases is the two operands of GEPs, is it better to factor out common code by doing below?
> > if (VF.isScalable()) {
> >   // build GEP operands for scalable case
> > }
> > else {
> >   // build GEP operands for fixed case
> > }
> > // common code building VecPtr with above GEP operands.
> I guess alternatively we could commonise them into one block where we always calculate
> 
> ```        Value *LastLane =
>             Builder.CreateSub(Builder.getInt32(1),
>                               getRuntimeVF(Builder, Builder.getInt32Ty(), VF));```
> 
> for both fixed-width and scalable vectors, and just let instcombine and other passes fold the constants? I'm not sure if that breaks lots of tests though ...
So @david-arm 's suggestion works for fixed width and scalable vectors.
At least no test has failed.
I could also have done the way @bin.cheng-ali suggested:

```
if (scalable){
  NumElts =   Builder.CreateMul(Builder.getInt32(1),
                      getRuntimeVF(Builder, Builder.getInt32Ty(), VF));
  LastLane =   Builder.CreateSub(Builder.getInt32(1),
                      getRuntimeVF(Builder, Builder.getInt32Ty(), VF));
}else{
   NumElts = 1 * VF.getKnownMinValue()
  LastLane = 1 - VF.getKnownMinValue()
}
```
But as all work with getRunTime I thought it was code duplication and this way the code looks simpler.
I added a comment to explain how it works if it is fixed-width vector for me in the future too

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2818
       if (isMaskRequired) // Reverse of a null all-one mask is a null mask.
         BlockInMaskParts[Part] = reverseVector(BlockInMaskParts[Part]);
     } else {
----------------
david-arm wrote:
> Hi @CarolineConcatto, I just realised there isn't a test for this for scalable vectors. Would you be able to add a test for blocks with masks too? I think it should be something like:
> 
> ```void foo(int * __restrict__ a, int * __restrict__ cond, long long N) {
>   for (int i = N - 1; i >= 0; i--)
>   {
>     if (cond[i])
>       a[i] += 1;
>   }
> }```
> 
Thank you @david-arm for pointing that.
Yes, indeed it was missing test for that.
 The cost model for i1 was missing with vector reverse.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95363/new/

https://reviews.llvm.org/D95363