[PATCH] D126563: [RISCV] Allow PRE of vsetvli involving non-1 LMUL

Philip Reames via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue May 31 10:34:35 PDT 2022


reames added a comment.

In D126563#3546894 <https://reviews.llvm.org/D126563#3546894>, @frasercrmck wrote:

> In D126563#3543968 <https://reviews.llvm.org/D126563#3543968>, @reames wrote:
>
>> In D126563#3543770 <https://reviews.llvm.org/D126563#3543770>, @frasercrmck wrote:
>>
>>> Seems like there's no fractional LMULs tested by this patch? Does this suggest we should add some more test coverage?
>>
>> Well, I would, but I could not find an example in tree of what a fractional LMUL looks like in IR.  (Probably just because I don't know what syntax looks like).  If you give me an example, I can take it from there.
>
> It's certainly easier with scalable vectors, but we do codegen fractional LMULs for fixed vectors if the minimum VLEN is sufficiently large that we know the vector can be contained within a fraction of a whole register. For example, this (copied) test case uses `mf2` with `-riscv-v-vector-bits-min=256`:
>
>   define void @sink_splat_mul_lmulmf2(i32* nocapture %a, i32 signext %x) {
>   entry:
>     %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
>     %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer
>     br label %vector.body
>   
>   vector.body:                                      ; preds = %vector.body, %entry
>     %index = phi i64 [ 0, %entry ], [ %index.next, %vector.body ]
>     %0 = getelementptr inbounds i32, i32* %a, i64 %index
>     %1 = bitcast i32* %0 to <4 x i32>*
>     %wide.load = load <4 x i32>, <4 x i32>* %1, align 8
>     %2 = mul <4 x i32> %wide.load, %broadcast.splat
>     %3 = bitcast i32* %0 to <4 x i32>*
>     store <4 x i32> %2, <4 x i32>* %3, align 8
>     %index.next = add nuw i64 %index, 4
>     %4 = icmp eq i64 %index.next, 1024
>     br i1 %4, label %for.cond.cleanup, label %vector.body
>   
>   for.cond.cleanup:                                 ; preds = %vector.body
>     ret void
>   }

Added coverage in 33b1be591 <https://reviews.llvm.org/rG33b1be5916669a74b3dc11b9f30f1ddb12270a2e>.

For my context, why is it profitable to use fractional LMULs over LMUL=1?  I'm aware of the extend/truncate cases, but for operations like VADD, it seems like using mf2 and m1 are equivalent (assuming VL is the same) right?

The only case I can think of that might be profitable would be using a fractional lmul so that VLMax (and thus the x0 encoding) is equal to the AVL.  That seems somewhat questionable on it's own.

Using a mix of lmuls makes removing vsetvlis trickier.  If we simply canonicalized fractional lmuls to lmul=1 (using knowledge about the vector length if needed for the vlmax case), it seems we'd potentially remove vsetvlis.

At least toggling back and forth between fractional and lmul=1 doesn't change VL for the subset of AVLs less than the fractional width.  This does at least mean we can use the AVL preserving variant.  (Though, I'm not sure we actually do this... a quick look seems to indicate we don't.)

In general, I'm struggling to understand why we'd want to use fractional lmuls.  Any ideas?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126563/new/

https://reviews.llvm.org/D126563



More information about the llvm-commits mailing list