[PATCH] D18838: [AArch64][CodeGen] Fix of incorrect peephole optimization in AArch64InstrInfo::optimizeCompareInstr

Wed Apr 6 13:01:51 PDT 2016

t.p.northover added a subscriber: t.p.northover.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:975-977
@@ +974,5 @@
+/// Check if Instr can produce a needed condition code.
+static bool canInstrProduceCondCode(MachineInstr &Instr,
+    AArch64CC::CondCode NeededCondCode) {
+  switch (NeededCondCode) {
+    default:
----------------
I think the documentation should call out the fact that it really does check whether *all* uses are equivalent.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:981
@@ +980,3 @@
+
+    case AArch64CC::NE: {
+      switch (Instr.getOpcode()) {
----------------
Why only NE? At the very least NE/EQ are pretty equivalent.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:985-988
@@ +984,6 @@
+          break;
+        case AArch64::ADDWri:
+        case AArch64::ADDXri:
+        case AArch64::ADDSWri:
+        case AArch64::ADDSXri:
+          return Instr.getOperand(1).isFI()
----------------
I don't follow this logic. A pure ADD should never produce any CondCode, and I'm not sure why an FI operand is involved at all.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1007-1033
@@ +1006,29 @@
+  assert(TRI);
+  auto EndLoc = firstModificationOfCondFlagsAfterInstr(CmpInstr, TRI).getInstrIterator();
+  // There are instructions which can read and write flags at the same time.
+  if (EndLoc != CmpInstr->getParent()->instr_end()
+      && EndLoc->readsRegister(AArch64::NZCV, TRI)) {
+    EndLoc = std::next(EndLoc);
+  }
+  auto CFlagsFirstReadLoc
+    = std::find_if(std::next(CmpInstr->getIterator()),
+                   EndLoc,
+                   [TRI](MachineInstr &MI) {
+                     return MI.readsRegister(AArch64::NZCV, TRI);
+                   });
+  if (CFlagsFirstReadLoc == EndLoc)
+    return AArch64CC::Invalid;
+
+  AArch64CC::CondCode UsedCondCode = findCondCodeUsedByInstr(*CFlagsFirstReadLoc);
+  if (UsedCondCode == AArch64CC::Invalid)
+    return AArch64CC::Invalid;
+
+  // Check that all other used condition codes are the same.
+  // If they are not it might not be safe to remove CmpInstr.
+  if (!std::all_of(std::next(CFlagsFirstReadLoc), EndLoc,
+                   [UsedCondCode, TRI](MachineInstr &MI) {
+                     return !MI.readsRegister(AArch64::NZCV, TRI)
+                            || (UsedCondCode == findCondCodeUsedByInstr(MI));
+                   }))
+    return AArch64CC::Invalid;
+
----------------
I think this is probably a bit too tricksy, and both less efficient and less clear than the naive loop.

I also think that looking at this from the position of CondCodes is probably the wrong level. The real distinction is which bits of NZCV are used (in particular, ADDS sets C & V differently from SUBS, as I recall, so mixing tests that only use Z & N is fine).

With both of those observations, I think this function reduces to something like:

   for(MI= CmpInstr; MI != instr_end; ++MI) {
     if (MI->readsRegister(NZCV))
       NZCVUsed |= findNZCVUsedByInstr(MI);
     if (MI->modifiesRegister(NZCV)) {
       // Possibly return 0b1111 if MI also reads.
       return NZCVUsed;
     }
   }

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1034
@@ +1033,3 @@
+    return AArch64CC::Invalid;
+
+  return UsedCondCode;
----------------
I think we should also check (or assert, if also checked before) that NZCV doesn't escape the basic block.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1043
@@ +1042,3 @@
+static bool isSUBS(unsigned Opcode) {
+  return AArch64::SUBSWri <= Opcode && Opcode <= AArch64::SUBSXrx64;
+}
----------------
We definitely shouldn't be relying on specific ordering like this, even if it is alphabetical.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1068-1071
@@ +1067,6 @@
+      assert(MiSOpcode != AArch64::INSTRUCTION_LIST_END);
+      if (sameSForm(CmpInstr->getOpcode(), MiSOpcode, isADDS))
+        return true;
+      if (sameSForm(CmpInstr->getOpcode(), MiSOpcode, isSUBS))
+        return true;
+      if (canInstrProduceCondCode(*MI, UsedCondCode))
----------------
These are unconditional (i.e. work on all possible AArch64CC) aren't they?

Anyway, I think (with the repurposing of findUsedCondCodeAfterCmp suggested above) that the NZCV part of this function is really:

    if (sameSForm(..., isADDS) || sameSForm(..., isSUBS))
      return true;
    auto NZCVUsed = findUsedNZCVAfterCMP(...);
    return !NZCVUsed.C && !NZCVUsed.V;

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1094
@@ -942,3 +1093,3 @@
   const TargetRegisterInfo *TRI = &getRegisterInfo();
   if (areCFlagsAccessedBetweenInstrs(MI, CmpInstr, TRI))
     return false;
----------------
This completely short-circuits the other walks you're adding doesn't it? (E.g. findUsedCondCodeAfterCmp will never find more than one use).

================
Comment at: test/CodeGen/AArch64/arm64-regress-opt-cmp.ll:1
@@ +1,2 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -O3 -o - %s | FileCheck %s
+
----------------
I think these tests are rather weak given how much code is changing and the complexity of the result. I'd probably suggest a MIR test for this one; they can be a bit temperamental to get started, but allow you to exercise many more edge cases quickly.


http://reviews.llvm.org/D18838