[PATCH] D130442: [RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics.

Wed Aug 3 00:42:15 PDT 2022

fakepaper56 marked 2 inline comments as done.
fakepaper56 added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:2610
+        CurDAG->getMachineNode(MaskedOpc, DL, True->getVTList(), Ops);
+    ReplaceUses(N, Result);
+
----------------
craig.topper wrote:
> craig.topper wrote:
> > This does not handle the chain output of True correctly if it has users. We would need to replace True.getValue(1) with N->getValue(1). ReplaceUses will only replace the direct users of N.
> Simple test case
> 
> 
> ```
> define void @vpmerge_vpload_store(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> * %p, <vscale x 2 x i1> %m, i32 zeroext %vl) {
> ; CHECK-LABEL: vpmerge_vpload_store:                                             
> ; CHECK:       # %bb.0:                                                          
> ; CHECK-NEXT:    vsetvli zero, a1, e32, m1, tu, mu                               
> ; CHECK-NEXT:    vle32.v v8, (a0), v0.t                                          
> ; CHECK-NEXT:    vs1r.v v8, (a0)                                                 
> ; CHECK-NEXT:    ret                                                             
>   %splat = insertelement <vscale x 2 x i1> poison, i1 -1, i32 0                  
>   %mask = shufflevector <vscale x 2 x i1> %splat, <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer
>   %a = call <vscale x 2 x i32> @llvm.vp.load.nxv2i32.p0nxv2i32(<vscale x 2 x i32> * %p, <vscale x 2 x i1> %mask, i32 %vl)
>   %b = call <vscale x 2 x i32> @llvm.vp.merge.nxv2i32(<vscale x 2 x i1> %m, <vscale x 2 x i32> %a, <vscale x 2 x i32> %passthru, i32 %vl)
>   store <vscale x 2 x i32> %b, <vscale x 2 x i32> * %p                           
>   ret void                                                                       
> }
> ```
> 
> Right after isel the MachineIR is
> 
> ```
> # Machine code for function vpmerge_vpload_store: IsSSA, TracksLiveness          
> Function Live Ins: $v8 in %0, $x10 in %1, $v0 in %2, $x11 in %3                  
>                                                                                  
> bb.0 (%ir-block.0):                                                              
>   liveins: $v8, $x10, $v0, $x11                                                  
>   %3:gprnox0 = COPY $x11                                                         
>   %2:vr = COPY $v0                                                               
>   %1:gpr = COPY $x10                                                             
>   %0:vrnov0 = COPY $v8                                                           
>   %4:vr = PseudoVLE32_V_M1 %1:gpr, %3:gprnox0, 5 :: (load unknown-size from %ir.p, align 8)
>   $v0 = COPY %2:vr                                                               
>   %5:vrnov0 = PseudoVLE32_V_M1_MASK %0:vrnov0(tied-def 0), %1:gpr, $v0, %3:gprnox0, 5, 0
>   VS1R_V killed %5:vrnov0, %1:gpr :: (store unknown-size into %ir.p, align 8)    
>   PseudoRET                                                                      
>                                                                                  
> # End machine code for function vpmerge_vpload_store.
> ```
> 
> Notice the two VLEs. Dead code elimination will eventually delete the extra one, but it shouldn't have to. For a more complex test we might put loads and stores in the wrong order in the MachineIR.
Done. Thank you find the bug.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130442/new/

https://reviews.llvm.org/D130442