[PATCH] D126563: [RISCV] Allow PRE of vsetvli involving non-1 LMUL

Tue May 31 16:18:19 PDT 2022

craig.topper added a comment.

In D126563#3547822 <https://reviews.llvm.org/D126563#3547822>, @reames wrote:

> In D126563#3546894 <https://reviews.llvm.org/D126563#3546894>, @frasercrmck wrote:
>
>> In D126563#3543968 <https://reviews.llvm.org/D126563#3543968>, @reames wrote:
>>
>>> In D126563#3543770 <https://reviews.llvm.org/D126563#3543770>, @frasercrmck wrote:
>>>
>>>> Seems like there's no fractional LMULs tested by this patch? Does this suggest we should add some more test coverage?
>>>
>>> Well, I would, but I could not find an example in tree of what a fractional LMUL looks like in IR.  (Probably just because I don't know what syntax looks like).  If you give me an example, I can take it from there.
>>
>> It's certainly easier with scalable vectors, but we do codegen fractional LMULs for fixed vectors if the minimum VLEN is sufficiently large that we know the vector can be contained within a fraction of a whole register. For example, this (copied) test case uses `mf2` with `-riscv-v-vector-bits-min=256`:
>>
>>   define void @sink_splat_mul_lmulmf2(i32* nocapture %a, i32 signext %x) {
>>   entry:
>>     %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
>>     %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer
>>     br label %vector.body
>>   
>>   vector.body:                                      ; preds = %vector.body, %entry
>>     %index = phi i64 [ 0, %entry ], [ %index.next, %vector.body ]
>>     %0 = getelementptr inbounds i32, i32* %a, i64 %index
>>     %1 = bitcast i32* %0 to <4 x i32>*
>>     %wide.load = load <4 x i32>, <4 x i32>* %1, align 8
>>     %2 = mul <4 x i32> %wide.load, %broadcast.splat
>>     %3 = bitcast i32* %0 to <4 x i32>*
>>     store <4 x i32> %2, <4 x i32>* %3, align 8
>>     %index.next = add nuw i64 %index, 4
>>     %4 = icmp eq i64 %index.next, 1024
>>     br i1 %4, label %for.cond.cleanup, label %vector.body
>>   
>>   for.cond.cleanup:                                 ; preds = %vector.body
>>     ret void
>>   }
>
> Added coverage in 33b1be591 <https://reviews.llvm.org/rG33b1be5916669a74b3dc11b9f30f1ddb12270a2e>.
>
> For my context, why is it profitable to use fractional LMULs over LMUL=1?  I'm aware of the extend/truncate cases, but for operations like VADD, it seems like using mf2 and m1 are equivalent (assuming VL is the same) right?

You're correct as far as the hardware behavior goes.

> The only case I can think of that might be profitable would be using a fractional lmul so that VLMax (and thus the x0 encoding) is equal to the AVL.  That seems somewhat questionable on it's own.
>
> Using a mix of lmuls makes removing vsetvlis trickier.  If we simply canonicalized fractional lmuls to lmul=1 (using knowledge about the vector length if needed for the vlmax case), it seems we'd potentially remove vsetvlis.
>
> At least toggling back and forth between fractional and lmul=1 doesn't change VL for the subset of AVLs less than the fractional width.  This does at least mean we can use the AVL preserving variant.  (Though, I'm not sure we actually do this... a quick look seems to indicate we don't.)
>
> In general, I'm struggling to understand why we'd want to use fractional lmuls.  Any ideas?

The fixed vector to scalable vector mapping has been designed so that the vectors of ELEN(64 or 32) sized elements with total width <= `riscv-v-vector-bits-min` will produce an LMUL=1 scalable vector. Wider vectors will use LMUL=2,4,8. Fractional LMUL is not supported for vectors with SEW==ELEN by spec. Vectors with the same number elements and smaller SEW will use a proportionally smaller ELEN. This mapping is independent of what types are actually used in the basic block or function.

So in mixed element width code with all vectors having the same number elements, all the vsetvlis should have the same SEW::LMUL ratio. That seems like the ideal property to have for vsetvli removal.

On an ELEN=64 target, with no i64/f64 elements and all vectors <= `riscv-v-vector-bits-min` we will only have fractional LMULs.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126563/new/

https://reviews.llvm.org/D126563