[PATCH] D125392: [riscv] Canonicalize vsetvli (vsetvli avl, vtype1) vtype2 transitions

Wed May 11 08:19:50 PDT 2022

reames created this revision.
reames added reviewers: frasercrmck, craig.topper.
Herald added subscribers: sunshaoce, VincentWu, luke957, StephenFan, vkmr, evandro, luismarques, apazos, sameer.abuasal, s.egerton, Jim, benna, psnobl, jocewei, PkmX, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, bollu, simoncook, johnrusso, rbar, asb, hiraditya, arichardson, mcrosier.
Herald added a project: All.
reames requested review of this revision.
Herald added subscribers: pcwang-thead, eopXD, MaskRay.
Herald added a project: LLVM.

This patch is an alternative to a piece of D125270 <https://reviews.llvm.org/D125270>.  If we have one vsetvli which is using as AVL the output of another, and the prior AVL can be proven to produce the same VL value as that defining one, we can use the AVL from the prior instruction.  This has the effect of removing a state transition on AVL, and will let us use the cheaper 'vsetvli x0, x0, vtype1' form or possible even skip emitting it entirely.

This builds on the same infrastructure as D125337 <https://reviews.llvm.org/D125337>, and does the analogous extension to working on abstract states instead of only prior explicit vsetvli instructions.  This is where the (relatively minor) code improvements come from.

More importantly, this fixes the last case where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries.

Doing this transform inside the dataflow can cause the compatibility of a later store to change with regards to the current state. test15 in the diff illustrates this case well. What we have is a vsetvli which is mutated by one following vector op, but whose GPR result is used by another. The compatibility logic walks back to the def in this case, and checks to see if it matches the immediate prior state. In phase 1 and 2, it doesn't, and in phase 3 (after mutation) it does because we remove a transition which caused it to differ.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D125392

Files:
  llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
  llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll


Index: llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
===================================================================

--- llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
+++ llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
@@ -281,10 +281,9 @@
 define <vscale x 1 x double> @test15(i64 %avl, <vscale x 1 x double> %a, <vscale x 1 x double> %b) nounwind {
 ; CHECK-LABEL: test15:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vsetvli a0, a0, e64, m1, ta, mu
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, mu
 ; CHECK-NEXT:    vfadd.vv v8, v8, v9
 ; CHECK-NEXT:    vfadd.vv v8, v8, v9
-; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, mu
 ; CHECK-NEXT:    ret
 entry:
   %vsetvli = tail call i64 @llvm.riscv.vsetvli(i64 %avl, i64 2, i64 7)
@@ -354,12 +353,12 @@
 define <vscale x 1 x double> @test18(<vscale x 1 x double> %a, double %b) nounwind {
 ; CHECK-LABEL: test18:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vsetivli a0, 6, e64, m1, tu, mu
+; CHECK-NEXT:    vsetivli zero, 6, e64, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v9, v8
 ; CHECK-NEXT:    vfmv.s.f v9, fa0
-; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, mu
+; CHECK-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
 ; CHECK-NEXT:    vfadd.vv v8, v8, v8
-; CHECK-NEXT:    vsetivli zero, 1, e64, m1, tu, mu
+; CHECK-NEXT:    vsetvli zero, zero, e64, m1, tu, mu
 ; CHECK-NEXT:    vfmv.s.f v8, fa0
 ; CHECK-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
 ; CHECK-NEXT:    vfadd.vv v8, v9, v8
Index: llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
===================================================================
--- llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -1160,13 +1160,15 @@
           // with current VL/VTYPE.
           bool NeedInsertVSETVLI = true;
           if (PrevVSETVLIMI) {
-            bool HasSameAVL =
-                CurInfo.hasSameAVL(NewInfo) ||
-                (NewInfo.hasAVLReg() && NewInfo.getAVLReg().isVirtual() &&
-                 NewInfo.getAVLReg() == PrevVSETVLIMI->getOperand(0).getReg());
             // If these two VSETVLI have the same AVL and the same VLMAX,
             // we could merge these two VSETVLI.
-            if (HasSameAVL && CurInfo.hasSameVLMAX(NewInfo)) {
+            // TODO: If we remove this, we get a `vsetvli x0, x0, vtype'
+            // here.  We could simply let this be emitted, then remove
+            // the unused vsetvlis in a post-pass.
+            if (CurInfo.hasSameAVL(NewInfo) && CurInfo.hasSameVLMAX(NewInfo)) {
+              // WARNING: For correctness, it is essential the contents of VL
+              // and VTYPE stay the same after this instruction.  This
+              // greatly limits the mutation we can legally do here.
               PrevVSETVLIMI->getOperand(2).setImm(NewInfo.encodeVTYPE());
               NeedInsertVSETVLI = false;
             }
@@ -1248,6 +1250,32 @@
     }
 
     if (RISCVII::hasSEWOp(TSFlags)) {
+      if (RISCVII::hasVLOp(TSFlags)) {
+        const auto Require = computeInfoForInstr(MI, TSFlags, MRI);
+        // If the AVL is the result of a previous vsetvli which has the
+        // same AVL and VLMAX as our current state, we can reuse the AVL
+        // from the current state for the new one.  This allows us to
+        // generate 'vsetvli x0, x0, vtype" or possible skip the transition
+        // entirely.
+        if (!CurInfo.isUnknown() && Require.hasAVLReg() &&
+            Require.getAVLReg().isVirtual()) {
+          if (MachineInstr *DefMI = MRI->getVRegDef(Require.getAVLReg())) {
+            if (isVectorConfigInstr(*DefMI)) {
+              VSETVLIInfo DefInfo = getInfoForVSETVLI(*DefMI);
+              if (DefInfo.hasSameAVL(CurInfo) &&
+                  DefInfo.hasSameVLMAX(CurInfo)) {
+                MachineOperand &VLOp = MI.getOperand(getVLOpNum(MI));
+                if (CurInfo.hasAVLImm())
+                  VLOp.ChangeToImmediate(CurInfo.getAVLImm());
+                else
+                  VLOp.ChangeToRegister(CurInfo.getAVLReg(), /*IsDef*/ false);
+                CurInfo = computeInfoForInstr(MI, TSFlags, MRI);
+                continue;
+              }
+            }
+          }
+        }
+      }
       CurInfo = computeInfoForInstr(MI, TSFlags, MRI);
       continue;
     }


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D125392.428669.patch
Type: text/x-patch
Size: 4277 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220511/dfc8552b/attachment.bin>