[llvm] 72925d9 - [riscv] Canonicalize vsetvli (vsetvli avl, vtype1) vtype2 transitionsas reviewed

Wed May 11 10:45:40 PDT 2022

Author: Philip Reames
Date: 2022-05-11T10:45:29-07:00
New Revision: 72925d98bf928431e430a49a0504bda8d7d0d184

URL: https://github.com/llvm/llvm-project/commit/72925d98bf928431e430a49a0504bda8d7d0d184
DIFF: https://github.com/llvm/llvm-project/commit/72925d98bf928431e430a49a0504bda8d7d0d184.diff

LOG: [riscv] Canonicalize vsetvli (vsetvli avl, vtype1) vtype2 transitionsas reviewed

This patch is an alternative to a piece of D125270. If we have one vsetvli which is using as AVL the output of another, and the prior AVL can be proven to produce the same VL value as that defining one, we can use the AVL from the prior instruction. This has the effect of removing a state transition on AVL, and will let us use the cheaper 'vsetvli x0, x0, vtype1' form or possible even skip emitting it entirely.

This builds on the same infrastructure as D125337, and does the analogous extension to working on abstract states instead of only prior explicit vsetvli instructions. This is where the (relatively minor) code improvements come from.

More importantly, this fixes the last case where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries.

Doing this transform inside the dataflow can cause the compatibility of a later store to change with regards to the current state. test15 in the diff illustrates this case well. What we have is a vsetvli which is mutated by one following vector op, but whose GPR result is used by another. The compatibility logic walks back to the def in this case, and checks to see if it matches the immediate prior state. In phase 1 and 2, it doesn't, and in phase 3 (after mutation) it does because we remove a transition which caused it to differ.

Differential Revision: https://reviews.llvm.org/D125392

Added: 
    

Modified: 
    llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
    llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index 954ef7e697112..b0b911572506f 100644

--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -1160,13 +1160,15 @@ void RISCVInsertVSETVLI::emitVSETVLIs(MachineBasicBlock &MBB) {
           // with current VL/VTYPE.
           bool NeedInsertVSETVLI = true;
           if (PrevVSETVLIMI) {
-            bool HasSameAVL =
-                CurInfo.hasSameAVL(NewInfo) ||
-                (NewInfo.hasAVLReg() && NewInfo.getAVLReg().isVirtual() &&
-                 NewInfo.getAVLReg() == PrevVSETVLIMI->getOperand(0).getReg());
             // If these two VSETVLI have the same AVL and the same VLMAX,
             // we could merge these two VSETVLI.
-            if (HasSameAVL && CurInfo.hasSameVLMAX(NewInfo)) {
+            // TODO: If we remove this, we get a `vsetvli x0, x0, vtype'
+            // here.  We could simply let this be emitted, then remove
+            // the unused vsetvlis in a post-pass.
+            if (CurInfo.hasSameAVL(NewInfo) && CurInfo.hasSameVLMAX(NewInfo)) {
+              // WARNING: For correctness, it is essential the contents of VL
+              // and VTYPE stay the same after MI.  This greatly limits the
+              // mutation we can legally do here.
               PrevVSETVLIMI->getOperand(2).setImm(NewInfo.encodeVTYPE());
               NeedInsertVSETVLI = false;
             }
@@ -1248,6 +1250,32 @@ void RISCVInsertVSETVLI::doLocalPrepass(MachineBasicBlock &MBB) {
     }
 
     if (RISCVII::hasSEWOp(TSFlags)) {
+      if (RISCVII::hasVLOp(TSFlags)) {
+        const auto Require = computeInfoForInstr(MI, TSFlags, MRI);
+        // If the AVL is the result of a previous vsetvli which has the
+        // same AVL and VLMAX as our current state, we can reuse the AVL
+        // from the current state for the new one.  This allows us to
+        // generate 'vsetvli x0, x0, vtype" or possible skip the transition
+        // entirely.
+        if (!CurInfo.isUnknown() && Require.hasAVLReg() &&
+            Require.getAVLReg().isVirtual()) {
+          if (MachineInstr *DefMI = MRI->getVRegDef(Require.getAVLReg())) {
+            if (isVectorConfigInstr(*DefMI)) {
+              VSETVLIInfo DefInfo = getInfoForVSETVLI(*DefMI);
+              if (DefInfo.hasSameAVL(CurInfo) &&
+                  DefInfo.hasSameVLMAX(CurInfo)) {
+                MachineOperand &VLOp = MI.getOperand(getVLOpNum(MI));
+                if (CurInfo.hasAVLImm())
+                  VLOp.ChangeToImmediate(CurInfo.getAVLImm());
+                else
+                  VLOp.ChangeToRegister(CurInfo.getAVLReg(), /*IsDef*/ false);
+                CurInfo = computeInfoForInstr(MI, TSFlags, MRI);
+                continue;
+              }
+            }
+          }
+        }
+      }
       CurInfo = computeInfoForInstr(MI, TSFlags, MRI);
       continue;
     }

diff  --git a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
index aa826ecfd18e1..e22a7a16199d0 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
@@ -281,10 +281,9 @@ entry:
 define <vscale x 1 x double> @test15(i64 %avl, <vscale x 1 x double> %a, <vscale x 1 x double> %b) nounwind {
 ; CHECK-LABEL: test15:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vsetvli a0, a0, e64, m1, ta, mu
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, mu
 ; CHECK-NEXT:    vfadd.vv v8, v8, v9
 ; CHECK-NEXT:    vfadd.vv v8, v8, v9
-; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, mu
 ; CHECK-NEXT:    ret
 entry:
   %vsetvli = tail call i64 @llvm.riscv.vsetvli(i64 %avl, i64 2, i64 7)
@@ -354,12 +353,12 @@ entry:
 define <vscale x 1 x double> @test18(<vscale x 1 x double> %a, double %b) nounwind {
 ; CHECK-LABEL: test18:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vsetivli a0, 6, e64, m1, tu, mu
+; CHECK-NEXT:    vsetivli zero, 6, e64, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v9, v8
 ; CHECK-NEXT:    vfmv.s.f v9, fa0
-; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, mu
+; CHECK-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
 ; CHECK-NEXT:    vfadd.vv v8, v8, v8
-; CHECK-NEXT:    vsetivli zero, 1, e64, m1, tu, mu
+; CHECK-NEXT:    vsetvli zero, zero, e64, m1, tu, mu
 ; CHECK-NEXT:    vfmv.s.f v8, fa0
 ; CHECK-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
 ; CHECK-NEXT:    vfadd.vv v8, v9, v8