[llvm] 3fa6510 - [CodeGen][AArch64][SVE] Fold [rdffr, ptest] => rdffrs; bugfix for optimizePTestInstr
Peter Waller via llvm-commits
llvm-commits at lists.llvm.org
Wed May 12 07:06:47 PDT 2021
Author: Peter Waller
Date: 2021-05-12T15:06:22+01:00
New Revision: 3fa6510f6ea0101c70592487074957bb1cde576c
URL: https://github.com/llvm/llvm-project/commit/3fa6510f6ea0101c70592487074957bb1cde576c
DIFF: https://github.com/llvm/llvm-project/commit/3fa6510f6ea0101c70592487074957bb1cde576c.diff
LOG: [CodeGen][AArch64][SVE] Fold [rdffr, ptest] => rdffrs; bugfix for optimizePTestInstr
When a ptest is used to set flags from the output of rdffr, the ptest
can be eliminated, using a flags-setting rdffrs instead.
Additionally, check that nothing consumes flags between rdffr and ptest;
this case appears to have been missed previously.
* There is no unpredicated RDFFRS instruction.
* If substituting RDFFR_PP, require that the mask argument of the
PTEST matches that of the RDFFR_PP.
* Move some precondition code up inside optimizePTestInstr, so that it
covers the new code paths for RDFFR which return earlier.
* Only consider RDFFR, PTEST in same basic block.
* Check for other flag setting instructions between the two, abort if
found.
* Drop an old TODO comment about removing dead PTEST instructions.
RDFFR_P to follow in later patch.
Differential Revision: https://reviews.llvm.org/D101357
Added:
llvm/test/CodeGen/AArch64/sve-ptest-removal-rdffr.mir
Modified:
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
Removed:
################################################################################
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 02fc4033e189..6c904712d2ea 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -1366,6 +1366,18 @@ bool AArch64InstrInfo::optimizePTestInstr(
OpChanged = true;
break;
}
+ case AArch64::RDFFR_PPz: {
+ // rdffr p1.b, PredMask=p0/z <--- Definition of Pred
+ // ptest Mask=p0, Pred=p1.b <--- If equal masks, remove this and use
+ // `rdffrs p1.b, p0/z` above.
+ auto *PredMask = MRI->getUniqueVRegDef(Pred->getOperand(1).getReg());
+ if (Mask != PredMask)
+ return false;
+
+ NewOp = AArch64::RDFFRS_PPz;
+ OpChanged = true;
+ break;
+ }
default:
// Bail out if we don't recognize the input
return false;
@@ -1374,23 +1386,11 @@ bool AArch64InstrInfo::optimizePTestInstr(
const TargetRegisterInfo *TRI = &getRegisterInfo();
- // If the predicate is in a
diff erent block (possibly because its been
- // hoisted out), then assume the flags are set in between statements.
- if (Pred->getParent() != PTest->getParent())
+ // If another instruction between Pred and PTest accesses flags, don't remove
+ // the ptest or update the earlier instruction to modify them.
+ if (areCFlagsAccessedBetweenInstrs(Pred, PTest, TRI))
return false;
- // If another instruction between the propagation and test sets the
- // flags, don't remove the ptest.
- MachineBasicBlock::iterator I = Pred, E = PTest;
- ++I; // Skip past the predicate op itself.
- for (; I != E; ++I) {
- const MachineInstr &Inst = *I;
-
- // TODO: If the ptest flags are unused, we could still remove it.
- if (Inst.modifiesRegister(AArch64::NZCV, TRI))
- return false;
- }
-
// If we pass all the checks, it's safe to remove the PTEST and use the flags
// as they are prior to PTEST. Sometimes this requires the tested PTEST
// operand to be replaced with an equivalent instruction that also sets the
diff --git a/llvm/test/CodeGen/AArch64/sve-ptest-removal-rdffr.mir b/llvm/test/CodeGen/AArch64/sve-ptest-removal-rdffr.mir
new file mode 100644
index 000000000000..082781d8b056
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-ptest-removal-rdffr.mir
@@ -0,0 +1,90 @@
+# RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve -run-pass=peephole-opt -verify-machineinstrs %s -o - | FileCheck %s
+# Test that RDFFR followed by PTEST is replaced with RDFFRS.
+---
+# CHECK-LABEL: name:{{\s*}} substitute_rdffr_pp_with_rdffrs_pp
+name: substitute_rdffr_pp_with_rdffrs_pp
+tracksRegLiveness: true
+body: |
+ bb.0:
+ liveins: $ffr, $p0
+ %0:ppr_3b = COPY $p0
+
+ ; CHECK: RDFFRS_PPz
+ ; CHECK-NOT: PTEST
+ %1:ppr_3b = RDFFR_PPz %0:ppr_3b
+ PTEST_PP killed %0:ppr_3b, killed %1:ppr_3b, implicit-def $nzcv
+
+ ; Consume nzcv
+ %2:gpr32 = COPY $wzr
+ %3:gpr32 = CSINCWr killed %2, $wzr, 0, implicit $nzcv
+ $w0 = COPY %3
+ RET_ReallyLR implicit $w0
+...
+---
+# CHECK-LABEL: name:{{\s*}} fail_to_substitute_rdffr_pp_with_rdffrs_pp_
diff ering_mask
+name: fail_to_substitute_rdffr_pp_with_rdffrs_pp_
diff ering_mask
+tracksRegLiveness: true
+body: |
+ bb.0:
+ liveins: $ffr, $p0, $p1
+ %0:ppr_3b = COPY $p0
+ %1:ppr_3b = COPY $p1
+
+ ; CHECK: RDFFR_PPz
+ ; CHECK: PTEST
+ %2:ppr_3b = RDFFR_PPz %0:ppr_3b
+ PTEST_PP killed %1:ppr_3b, killed %2:ppr_3b, implicit-def $nzcv
+
+ ; Consume nzcv
+ %3:gpr32 = COPY $wzr
+ %4:gpr32 = CSINCWr killed %3, $wzr, 0, implicit $nzcv
+ $w0 = COPY %4
+ RET_ReallyLR implicit $w0
+...
+---
+# CHECK-LABEL: name:{{\s*}} fail_to_substitute_rdffr_pp_with_rdffrs_pp_nzcv_clobbered
+name: fail_to_substitute_rdffr_pp_with_rdffrs_pp_nzcv_clobbered
+tracksRegLiveness: true
+body: |
+ bb.0:
+ liveins: $ffr, $p0, $x0
+ %0:ppr_3b = COPY $p0
+
+ ; CHECK: RDFFR_PPz
+ ; CHECK-NEXT: ADDSXrr
+ ; CHECK-NEXT: PTEST_PP
+ %1:ppr_3b = RDFFR_PPz %0:ppr_3b
+ ; Clobber nzcv
+ $x0 = ADDSXrr $x0, $x0, implicit-def $nzcv
+ PTEST_PP killed %0:ppr_3b, killed %1:ppr_3b, implicit-def $nzcv
+
+ ; Consume nzcv
+ %2:gpr32 = COPY $wzr
+ %3:gpr32 = CSINCWr killed %2, $wzr, 0, implicit $nzcv
+ $w0 = COPY %3
+ RET_ReallyLR implicit $w0
+...
+---
+# CHECK-LABEL: name:{{\s*}} fail_to_substitute_rdffr_pp_with_rdffrs_pp_nzcv_flags_used_between
+name: fail_to_substitute_rdffr_pp_with_rdffrs_pp_nzcv_flags_used_between
+tracksRegLiveness: true
+body: |
+ bb.0:
+ liveins: $ffr, $p0, $x0
+ %0:ppr_3b = COPY $p0
+
+ $wzr = SUBSWri $w0, 0, 0, implicit-def $nzcv
+
+ ; CHECK: RDFFR_PPz
+ ; CHECK-NEXT: CSINCWr
+ ; CHECK-NEXT: PTEST_PP
+ %1:ppr_3b = RDFFR_PPz %0:ppr_3b
+ ; Consume nzcv
+ %2:gpr32 = CSINCWr $wzr, $wzr, 0, implicit $nzcv
+ PTEST_PP killed %0:ppr_3b, killed %1:ppr_3b, implicit-def $nzcv
+
+ ; Consume nzcv
+ %3:gpr32 = COPY $wzr
+ %4:gpr32 = CSINCWr killed %3, $wzr, 0, implicit $nzcv
+ $w0 = ORRWrs %4, %2, 1
+ RET_ReallyLR implicit $w0
More information about the llvm-commits
mailing list