[PATCH] D18838: [AArch64][CodeGen] Fix of incorrect peephole optimization in AArch64InstrInfo::optimizeCompareInstr
Tim Northover via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 6 13:01:51 PDT 2016
t.p.northover added a subscriber: t.p.northover.
================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:975-977
@@ +974,5 @@
+/// Check if Instr can produce a needed condition code.
+static bool canInstrProduceCondCode(MachineInstr &Instr,
+ AArch64CC::CondCode NeededCondCode) {
+ switch (NeededCondCode) {
+ default:
----------------
I think the documentation should call out the fact that it really does check whether *all* uses are equivalent.
================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:981
@@ +980,3 @@
+
+ case AArch64CC::NE: {
+ switch (Instr.getOpcode()) {
----------------
Why only NE? At the very least NE/EQ are pretty equivalent.
================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:985-988
@@ +984,6 @@
+ break;
+ case AArch64::ADDWri:
+ case AArch64::ADDXri:
+ case AArch64::ADDSWri:
+ case AArch64::ADDSXri:
+ return Instr.getOperand(1).isFI()
----------------
I don't follow this logic. A pure ADD should never produce any CondCode, and I'm not sure why an FI operand is involved at all.
================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1007-1033
@@ +1006,29 @@
+ assert(TRI);
+ auto EndLoc = firstModificationOfCondFlagsAfterInstr(CmpInstr, TRI).getInstrIterator();
+ // There are instructions which can read and write flags at the same time.
+ if (EndLoc != CmpInstr->getParent()->instr_end()
+ && EndLoc->readsRegister(AArch64::NZCV, TRI)) {
+ EndLoc = std::next(EndLoc);
+ }
+ auto CFlagsFirstReadLoc
+ = std::find_if(std::next(CmpInstr->getIterator()),
+ EndLoc,
+ [TRI](MachineInstr &MI) {
+ return MI.readsRegister(AArch64::NZCV, TRI);
+ });
+ if (CFlagsFirstReadLoc == EndLoc)
+ return AArch64CC::Invalid;
+
+ AArch64CC::CondCode UsedCondCode = findCondCodeUsedByInstr(*CFlagsFirstReadLoc);
+ if (UsedCondCode == AArch64CC::Invalid)
+ return AArch64CC::Invalid;
+
+ // Check that all other used condition codes are the same.
+ // If they are not it might not be safe to remove CmpInstr.
+ if (!std::all_of(std::next(CFlagsFirstReadLoc), EndLoc,
+ [UsedCondCode, TRI](MachineInstr &MI) {
+ return !MI.readsRegister(AArch64::NZCV, TRI)
+ || (UsedCondCode == findCondCodeUsedByInstr(MI));
+ }))
+ return AArch64CC::Invalid;
+
----------------
I think this is probably a bit too tricksy, and both less efficient and less clear than the naive loop.
I also think that looking at this from the position of CondCodes is probably the wrong level. The real distinction is which bits of NZCV are used (in particular, ADDS sets C & V differently from SUBS, as I recall, so mixing tests that only use Z & N is fine).
With both of those observations, I think this function reduces to something like:
for(MI= CmpInstr; MI != instr_end; ++MI) {
if (MI->readsRegister(NZCV))
NZCVUsed |= findNZCVUsedByInstr(MI);
if (MI->modifiesRegister(NZCV)) {
// Possibly return 0b1111 if MI also reads.
return NZCVUsed;
}
}
================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1034
@@ +1033,3 @@
+ return AArch64CC::Invalid;
+
+ return UsedCondCode;
----------------
I think we should also check (or assert, if also checked before) that NZCV doesn't escape the basic block.
================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1043
@@ +1042,3 @@
+static bool isSUBS(unsigned Opcode) {
+ return AArch64::SUBSWri <= Opcode && Opcode <= AArch64::SUBSXrx64;
+}
----------------
We definitely shouldn't be relying on specific ordering like this, even if it is alphabetical.
================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1068-1071
@@ +1067,6 @@
+ assert(MiSOpcode != AArch64::INSTRUCTION_LIST_END);
+ if (sameSForm(CmpInstr->getOpcode(), MiSOpcode, isADDS))
+ return true;
+ if (sameSForm(CmpInstr->getOpcode(), MiSOpcode, isSUBS))
+ return true;
+ if (canInstrProduceCondCode(*MI, UsedCondCode))
----------------
These are unconditional (i.e. work on all possible AArch64CC) aren't they?
Anyway, I think (with the repurposing of findUsedCondCodeAfterCmp suggested above) that the NZCV part of this function is really:
if (sameSForm(..., isADDS) || sameSForm(..., isSUBS))
return true;
auto NZCVUsed = findUsedNZCVAfterCMP(...);
return !NZCVUsed.C && !NZCVUsed.V;
================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1094
@@ -942,3 +1093,3 @@
const TargetRegisterInfo *TRI = &getRegisterInfo();
if (areCFlagsAccessedBetweenInstrs(MI, CmpInstr, TRI))
return false;
----------------
This completely short-circuits the other walks you're adding doesn't it? (E.g. findUsedCondCodeAfterCmp will never find more than one use).
================
Comment at: test/CodeGen/AArch64/arm64-regress-opt-cmp.ll:1
@@ +1,2 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -O3 -o - %s | FileCheck %s
+
----------------
I think these tests are rather weak given how much code is changing and the complexity of the result. I'd probably suggest a MIR test for this one; they can be a bit temperamental to get started, but allow you to exercise many more edge cases quickly.
http://reviews.llvm.org/D18838
More information about the llvm-commits
mailing list