[PATCH] D129735: [WIP][RISCV] Add new pass to transform undef to pesudo for vector values.

Fri Oct 21 11:49:57 PDT 2022

craig.topper added a comment.

In D129735#3873595 <https://reviews.llvm.org/D129735#3873595>, @BeMg wrote:

> In D129735#3873448 <https://reviews.llvm.org/D129735#3873448>, @craig.topper wrote:
>
>> In D129735#3873436 <https://reviews.llvm.org/D129735#3873436>, @BeMg wrote:
>>
>>> In D129735#3871632 <https://reviews.llvm.org/D129735#3871632>, @craig.topper wrote:
>>>
>>>> Does this patch work for this test case
>>>>
>>>>   define internal void @foo() {
>>>>   loopIR.preheader.i.i:
>>>>     %v15 = tail call <vscale x 1 x i16> @llvm.experimental.stepvector.nxv1i16()
>>>>     %v17 = tail call <vscale x 8 x i16> @llvm.vector.insert.nxv8i16.nxv1i16(<vscale x 8 x i16> poison, <vscale x 1 x i16> %v15, i64 0)
>>>>     %vs12.i.i.i = add <vscale x 1 x i16> %v15, shufflevector (<vscale x 1 x i16> insertelement (<vscale x 1 x i16> poison, i16 1, i32 0), <vscale x 1 x i16> poison, <vscale x 1 x i32> zeroinitializer)
>>>>     %v18 = tail call <vscale x 8 x i16> @llvm.vector.insert.nxv8i16.nxv1i16(<vscale x 8 x i16> poison, <vscale x 1 x i16> %vs12.i.i.i, i64 0)
>>>>     %vs16.i.i.i = add <vscale x 1 x i16> %v15, shufflevector (<vscale x 1 x i16> insertelement (<vscale x 1 x i16> poison, i16 3, i32 0), <vscale x 1 x i16> poison, <vscale x 1 x i32> zeroinitializer)
>>>>     %v20 = tail call <vscale x 8 x i16> @llvm.vector.insert.nxv8i16.nxv1i16(<vscale x 8 x i16> poison, <vscale x 1 x i16> %vs16.i.i.i, i64 0)
>>>>     br label %loopIR3.i.i
>>>>   
>>>>   loopIR3.i.i:                                      ; preds = %loopIR3.i.i, %loopIR.preheader.i.i
>>>>     %v37 = load <vscale x 8 x i8>, ptr addrspace(1) null, align 8
>>>>     %v38 = tail call <vscale x 8 x i8> @llvm.riscv.vrgatherei16.vv.nxv8i8.i64(<vscale x 8 x i8> undef, <vscale x 8 x i8> %v37, <vscale x 8 x i16> %v17, i64 4)
>>>>     %v40 = tail call <vscale x 8 x i8> @llvm.riscv.vrgatherei16.vv.nxv8i8.i64(<vscale x 8 x i8> undef, <vscale x 8 x i8> %v37, <vscale x 8 x i16> %v18, i64 4)
>>>>     %v42 = and <vscale x 8 x i8> %v38, %v40
>>>>     %v46 = tail call <vscale x 8 x i8> @llvm.riscv.vrgatherei16.vv.nxv8i8.i64(<vscale x 8 x i8> undef, <vscale x 8 x i8> %v37, <vscale x 8 x i16> %v20, i64 4)
>>>>     %v60 = and <vscale x 8 x i8> %v42, %v46
>>>>     store <vscale x 8 x i8> %v60, ptr addrspace(1) null, align 4
>>>>     br label %loopIR3.i.i
>>>>   }
>>>>   
>>>>   declare <vscale x 1 x i16> @llvm.experimental.stepvector.nxv1i16()
>>>>   
>>>>   declare <vscale x 8 x i16> @llvm.vector.insert.nxv8i16.nxv1i16(<vscale x 8 x i16>, <vscale x 1 x i16>, i64 immarg) 
>>>>   
>>>>   declare <vscale x 8 x i8> @llvm.riscv.vrgatherei16.vv.nxv8i8.i64(<vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i16>, i64)
>>>
>>> This IR doesn't generate the undef+early-clobber situation, so this pass will not work on it.
>>>
>>> RA result will not break the early-clobber constraint in current compiler.
>>>
>>> Corrsponding MachineInst before RA place below:
>>>
>>>   ********** MACHINEINSTRS **********
>>>   # Machine code for function foo: NoPHIs, TracksLiveness, TiedOpsRewritten, TracksDebugUserValues
>>>   
>>>   0B      bb.0.loopIR.preheader.i.i:
>>>             successors: %bb.1(0x80000000); %bb.1(100.00%)
>>>   
>>>   16B       dead %16:gpr = PseudoVSETVLIX0 $x0, 206, implicit-def $vl, implicit-def $vtype
>>>   32B       undef %0.sub_vrm1_0:vrm2 = PseudoVID_V_MF4 -1, 4, implicit $vl, implicit $vtype
>>>   64B       undef %1.sub_vrm1_0:vrm2 = PseudoVADD_VI_MF4 %0.sub_vrm1_0:vrm2, 1, -1, 4, implicit $vl, implicit $vtype
>>>   96B       undef %2.sub_vrm1_0:vrm2 = PseudoVADD_VI_MF4 %0.sub_vrm1_0:vrm2, 3, -1, 4, implicit $vl, implicit $vtype
>>>   
>>>   128B    bb.1.loopIR3.i.i:
>>>           ; predecessors: %bb.0, %bb.1
>>>             successors: %bb.1(0x80000000); %bb.1(100.00%)
>>>   
>>>   160B      %10:vr = VL1RE8_V $x0 :: (load unknown-size from `ptr addrspace(1) null`, align 8, addrspace 1)
>>>   176B      dead $x0 = PseudoVSETIVLI 4, 192, implicit-def $vl, implicit-def $vtype
>>>   192B      early-clobber %11:vr = PseudoVRGATHEREI16_VV_M1_M2 %10:vr, %0:vrm2, 4, 3, implicit $vl, implicit $vtype
>>>   208B      early-clobber %12:vr = PseudoVRGATHEREI16_VV_M1_M2 %10:vr, %1:vrm2, 4, 3, implicit $vl, implicit $vtype
>>>   224B      dead %17:gpr = PseudoVSETVLIX0 $x0, 192, implicit-def $vl, implicit-def $vtype
>>>   240B      %13:vr = PseudoVAND_VV_M1 %11:vr, %12:vr, -1, 3, implicit $vl, implicit $vtype
>>>   256B      dead $x0 = PseudoVSETIVLI 4, 192, implicit-def $vl, implicit-def $vtype
>>>   272B      early-clobber %14:vr = PseudoVRGATHEREI16_VV_M1_M2 %10:vr, %2:vrm2, 4, 3, implicit $vl, implicit $vtype
>>>   288B      dead %18:gpr = PseudoVSETVLIX0 $x0, 192, implicit-def $vl, implicit-def $vtype
>>>   304B      %15:vr = PseudoVAND_VV_M1 %13:vr, %14:vr, -1, 3, implicit $vl, implicit $vtype
>>>   320B      VS1R_V %15:vr, $x0 :: (store unknown-size into `ptr addrspace(1) null`, align 4, addrspace 1)
>>>   336B      PseudoBR %bb.1
>>
>> Generated assembly see the note inline.
>>
>>   foo:                                    # @foo
>>   	.cfi_startproc
>>   # %bb.0:                                # %loopIR.preheader.i.i
>>   	vsetvli	a0, zero, e16, mf4, ta, ma
>>   	vid.v	v8
>>   	vadd.vi	v10, v8, 1
>>   	vadd.vi	v12, v8, 3
>>   .LBB0_1:                                # %loopIR3.i.i
>>                                           # =>This Inner Loop Header: Depth=1
>>   	vl1r.v	v14, (zero)
>>   	vsetivli	zero, 4, e8, m1, ta, ma
>>   	vrgatherei16.vv	v15, v14, v8  <- The v14 here is LMUL=2 so it's v14 and v15. This means writing v15 violated the early clobber constraint.
>>   	vrgatherei16.vv	v16, v14, v10
>>   	vsetvli	a0, zero, e8, m1, ta, ma
>>   	vand.vv	v15, v15, v16
>>   	vsetivli	zero, 4, e8, m1, ta, ma
>>   	vrgatherei16.vv	v16, v14, v12
>>   	vsetvli	a0, zero, e8, m1, ta, ma
>>   	vand.vv	v14, v15, v16
>>   	vs1r.v	v14, (zero)
>>   	j	.LBB0_1
>>   .Lfunc_end0:
>>   	.size	foo, .Lfunc_end0-foo
>>   	.cfi_endproc
>>                                           # -- End function
>>   	.section	".note.GNU-stack","", at progbits
>
> Is this MIR wrong here? Does `%10` should be marked as `vrm2`?
> Compiler treat `%11` and `%10` as `M1`, `%0` as `M2`, so i think RA doesn't violated the early clobber constraint.
>
>   192B      early-clobber %11:vr = PseudoVRGATHEREI16_VV_M1_M2 %10:vr, %0:vrm2, 4, 3, implicit $vl, implicit $vtype
>   vrgatherei16.vv	v15, v14, v8 
>   // %11 -> v15
>   // %10 -> v14
>   // %0 -> v8
>
> Compiler Register allocation result:
>
>   ********** REWRITE VIRTUAL REGISTERS **********
>   ********** Function: foo
>   ********** REGISTER MAP **********
>   [%0 -> $v8m2] VRM2
>   [%1 -> $v10m2] VRM2
>   [%2 -> $v12m2] VRM2
>   [%10 -> $v14] VR
>   [%11 -> $v15] VR
>   [%12 -> $v16] VR
>   [%13 -> $v15] VR
>   [%14 -> $v16] VR
>   [%15 -> $v14] VR
>   [%16 -> $x10] GPR
>   [%17 -> $x10] GPR
>   [%18 -> $x10] GPR

You're right. I mixed up the operand order. I need to go look at this test again. It used to fail. Maybe we fixed it some other way.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129735/new/

https://reviews.llvm.org/D129735