[PATCH] D125747: [RISCV] Enable scalable vectorization by default for RVV

Fri May 27 11:21:38 PDT 2022

reames added a comment.

In D125747#3542559 <https://reviews.llvm.org/D125747#3542559>, @craig.topper wrote:

> In D125747#3541849 <https://reviews.llvm.org/D125747#3541849>, @Miss_Grape wrote:
>
>> In D125747#3518172 <https://reviews.llvm.org/D125747#3518172>, @craig.topper wrote:
>>
>>> Do you have any performance data?
>>
>> I use the TSCV test suit,  then run on the spike, 
>> F23214515: image.png <https://reviews.llvm.org/F23214515>
>>
>> when option: -scalable-vectorization=on, the performance more better. But I'm not sure if the performance data from the spike run can be used as a standard to measure performance
>
> Better than specifying -riscv-v-vector-bits-min to match the machine width?

@craig.topper I think this is somewhat the wrong question here.  While I agree that fixed length should be our eventual default for known vector lengths, we currently don't enable any vectorization.  If we can show either form of vectorization is generally profitable over the no-vectorization configuration, we should enable.  We can then evaluate the other configuration against that new baseline.

@Miss_Grape I struggle to make out what that screenshot is conveying.  Could you summarize please?  Also, a text attachment is greatly preferred over images.

================
Comment at: llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll:6
 ;
 define void @small_loop(i32* nocapture %inArray, i32 %size) nounwind {
 ; CHECK-LABEL: @small_loop(
----------------
craig.topper wrote:
> reames wrote:
> > craig.topper wrote:
> > > reames wrote:
> > > > reames wrote:
> > > > > As an aside, I took a look at the assembly for this example.  Th codegen for the vector loop ends up being less than great.  For instance:
> > > > > * We have a extend trapped in the loop for some reason
> > > > > * We rerun vsetvli on every iteration (despite it producing a fixed result)
> > > > > * We have a rem in the vector preheader.  That's rather expensive.  (Well, actually, we end up with a *libcall* because the attributes don't include +m, but I ignored that.)
> > > > To be fair, the first two issues also exist with fixed length vectorization of the same example.  
> > > What kind of extend?
> > zext - it looks easily removable, I'm surprised to see it after O2.  This shouldn't be a hard fix.  It appears in both vector and scalar loops.  
> Can you share what you're seeing? I see a zero extend in the preheader but not in the loops.
When running just loop-vectorize and then llc, I do not see the extend.  When replacing -loop-vectorize with -O2 on the same input, I do.  

```
./opt -mtriple=riscv64 -mattr=+v,+m test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -loop-vectorize -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 | ./llc -O3 
./opt -mtriple=riscv64 -mattr=+v,+m test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -O2 -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 | ./llc -O3 
```

The key bit of IR after -O2 is:

  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %13 = zext i32 %index to i64
  // ... 
  %index.next = add nuw i32 %index, %12
  %23 = icmp eq i32 %index.next, %n.vec
  br i1 %23, label %middle.block, label %vector.body, !llvm.loop !0

We should be able to widen %index IV in indvars without issue, I'm a bit surprised we don't.  

Interestingly, if I add Zba, something in codegen manages to fold away the extend now.  We didn't when I'd last looked.

```
./opt -mtriple=riscv64 -mattr=+v,+m,+zba test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -O2 -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 | ./llc -O3
```

To be clear, this is a minor codegen issue at best, and definitely nothing which should block this patch.  

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125747/new/

https://reviews.llvm.org/D125747