[PATCH] D130442: [RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics.

Tue Aug 2 12:24:58 PDT 2022

craig.topper added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:2610
+        CurDAG->getMachineNode(MaskedOpc, DL, True->getVTList(), Ops);
+    ReplaceUses(N, Result);
+
----------------
craig.topper wrote:
> This does not handle the chain output of True correctly if it has users. We would need to replace True.getValue(1) with N->getValue(1). ReplaceUses will only replace the direct users of N.
Simple test case

```
define void @vpmerge_vpload_store(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> * %p, <vscale x 2 x i1> %m, i32 zeroext %vl) {
; CHECK-LABEL: vpmerge_vpload_store:                                             
; CHECK:       # %bb.0:                                                          
; CHECK-NEXT:    vsetvli zero, a1, e32, m1, tu, mu                               
; CHECK-NEXT:    vle32.v v8, (a0), v0.t                                          
; CHECK-NEXT:    vs1r.v v8, (a0)                                                 
; CHECK-NEXT:    ret                                                             
  %splat = insertelement <vscale x 2 x i1> poison, i1 -1, i32 0                  
  %mask = shufflevector <vscale x 2 x i1> %splat, <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer
  %a = call <vscale x 2 x i32> @llvm.vp.load.nxv2i32.p0nxv2i32(<vscale x 2 x i32> * %p, <vscale x 2 x i1> %mask, i32 %vl)
  %b = call <vscale x 2 x i32> @llvm.vp.merge.nxv2i32(<vscale x 2 x i1> %m, <vscale x 2 x i32> %a, <vscale x 2 x i32> %passthru, i32 %vl)
  store <vscale x 2 x i32> %b, <vscale x 2 x i32> * %p                           
  ret void                                                                       
}
```

Right after isel the MachineIR is

```
# Machine code for function vpmerge_vpload_store: IsSSA, TracksLiveness          
Function Live Ins: $v8 in %0, $x10 in %1, $v0 in %2, $x11 in %3                  

bb.0 (%ir-block.0):                                                              
  liveins: $v8, $x10, $v0, $x11                                                  
  %3:gprnox0 = COPY $x11                                                         
  %2:vr = COPY $v0                                                               
  %1:gpr = COPY $x10                                                             
  %0:vrnov0 = COPY $v8                                                           
  %4:vr = PseudoVLE32_V_M1 %1:gpr, %3:gprnox0, 5 :: (load unknown-size from %ir.p, align 8)
  $v0 = COPY %2:vr                                                               
  %5:vrnov0 = PseudoVLE32_V_M1_MASK %0:vrnov0(tied-def 0), %1:gpr, $v0, %3:gprnox0, 5, 0
  VS1R_V killed %5:vrnov0, %1:gpr :: (store unknown-size into %ir.p, align 8)    
  PseudoRET                                                                      

# End machine code for function vpmerge_vpload_store.
```

Notice the two VLEs. Dead code elimination will eventually delete the extra one, but it shouldn't have to. For a more complex test we might put loads and stores in the wrong order in the MachineIR.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130442/new/

https://reviews.llvm.org/D130442