[PATCH] D18838: [AArch64][CodeGen] Fix of incorrect peephole optimization in AArch64InstrInfo::optimizeCompareInstr

Thu Apr 7 15:48:36 PDT 2016

eastig added a comment.

Hi Tim,

Thank you for comments. They are very useful.
See my answers.


================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:975-977
@@ +974,5 @@
+/// Check if Instr can produce a needed condition code.
+static bool canInstrProduceCondCode(MachineInstr &Instr,
+    AArch64CC::CondCode NeededCondCode) {
+  switch (NeededCondCode) {
+    default:
----------------
t.p.northover wrote:
> I think the documentation should call out the fact that it really does check whether *all* uses are equivalent.
It seems this function name is too general and does not correspond to what the function does. It was based on tests:

  - CodeGen/AArch64/arm64-arm64-dead-def-elimination-flag.ll
  - CodeGen/AArch64/arm64-dead-def-frame-index.ll

The tests have a comparison of a result of 'alloca' with null. They expect a compare operation to be removed. A sequence of instruction is ADDX+SUBSX+CSINCW.

I overcomplicated things.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:981
@@ +980,3 @@
+
+    case AArch64CC::NE: {
+      switch (Instr.getOpcode()) {
----------------
t.p.northover wrote:
> Why only NE? At the very least NE/EQ are pretty equivalent.
You are right.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:985-988
@@ +984,6 @@
+          break;
+        case AArch64::ADDWri:
+        case AArch64::ADDXri:
+        case AArch64::ADDSWri:
+        case AArch64::ADDSXri:
+          return Instr.getOperand(1).isFI()
----------------
t.p.northover wrote:
> I don't follow this logic. A pure ADD should never produce any CondCode, and I'm not sure why an FI operand is involved at all.
The logic is wrong.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1007-1033
@@ +1006,29 @@
+  assert(TRI);
+  auto EndLoc = firstModificationOfCondFlagsAfterInstr(CmpInstr, TRI).getInstrIterator();
+  // There are instructions which can read and write flags at the same time.
+  if (EndLoc != CmpInstr->getParent()->instr_end()
+      && EndLoc->readsRegister(AArch64::NZCV, TRI)) {
+    EndLoc = std::next(EndLoc);
+  }
+  auto CFlagsFirstReadLoc
+    = std::find_if(std::next(CmpInstr->getIterator()),
+                   EndLoc,
+                   [TRI](MachineInstr &MI) {
+                     return MI.readsRegister(AArch64::NZCV, TRI);
+                   });
+  if (CFlagsFirstReadLoc == EndLoc)
+    return AArch64CC::Invalid;
+
+  AArch64CC::CondCode UsedCondCode = findCondCodeUsedByInstr(*CFlagsFirstReadLoc);
+  if (UsedCondCode == AArch64CC::Invalid)
+    return AArch64CC::Invalid;
+
+  // Check that all other used condition codes are the same.
+  // If they are not it might not be safe to remove CmpInstr.
+  if (!std::all_of(std::next(CFlagsFirstReadLoc), EndLoc,
+                   [UsedCondCode, TRI](MachineInstr &MI) {
+                     return !MI.readsRegister(AArch64::NZCV, TRI)
+                            || (UsedCondCode == findCondCodeUsedByInstr(MI));
+                   }))
+    return AArch64CC::Invalid;
+
----------------
t.p.northover wrote:
> I think this is probably a bit too tricksy, and both less efficient and less clear than the naive loop.
> 
> I also think that looking at this from the position of CondCodes is probably the wrong level. The real distinction is which bits of NZCV are used (in particular, ADDS sets C & V differently from SUBS, as I recall, so mixing tests that only use Z & N is fine).
> 
> With both of those observations, I think this function reduces to something like:
> 
>    for(MI= CmpInstr; MI != instr_end; ++MI) {
>      if (MI->readsRegister(NZCV))
>        NZCVUsed |= findNZCVUsedByInstr(MI);
>      if (MI->modifiesRegister(NZCV)) {
>        // Possibly return 0b1111 if MI also reads.
>        return NZCVUsed;
>      }
>    }
Yes, you are right.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1034
@@ +1033,3 @@
+    return AArch64CC::Invalid;
+
+  return UsedCondCode;
----------------
t.p.northover wrote:
> I think we should also check (or assert, if also checked before) that NZCV doesn't escape the basic block.
Do you mean to check if flags are alive in successors of the basic block? If yes, this is checked in substituteCmpInstr.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1043
@@ +1042,3 @@
+static bool isSUBS(unsigned Opcode) {
+  return AArch64::SUBSWri <= Opcode && Opcode <= AArch64::SUBSXrx64;
+}
----------------
t.p.northover wrote:
> We definitely shouldn't be relying on specific ordering like this, even if it is alphabetical.
It's a pity. Maybe such functions already exist?

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1068-1071
@@ +1067,6 @@
+      assert(MiSOpcode != AArch64::INSTRUCTION_LIST_END);
+      if (sameSForm(CmpInstr->getOpcode(), MiSOpcode, isADDS))
+        return true;
+      if (sameSForm(CmpInstr->getOpcode(), MiSOpcode, isSUBS))
+        return true;
+      if (canInstrProduceCondCode(*MI, UsedCondCode))
----------------
t.p.northover wrote:
> These are unconditional (i.e. work on all possible AArch64CC) aren't they?
> 
> Anyway, I think (with the repurposing of findUsedCondCodeAfterCmp suggested above) that the NZCV part of this function is really:
> 
>     if (sameSForm(..., isADDS) || sameSForm(..., isSUBS))
>       return true;
>     auto NZCVUsed = findUsedNZCVAfterCMP(...);
>     return !NZCVUsed.C && !NZCVUsed.V;
You are right.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:1094
@@ -942,3 +1093,3 @@
   const TargetRegisterInfo *TRI = &getRegisterInfo();
   if (areCFlagsAccessedBetweenInstrs(MI, CmpInstr, TRI))
     return false;
----------------
t.p.northover wrote:
> This completely short-circuits the other walks you're adding doesn't it? (E.g. findUsedCondCodeAfterCmp will never find more than one use).
It checks accesses(read, write) before Cmp and after MI. It was in the original code. I think this was done in more strict way to make code simpler because such a situation is rare if it ever exists.

================
Comment at: test/CodeGen/AArch64/arm64-regress-opt-cmp.ll:1
@@ +1,2 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -O3 -o - %s | FileCheck %s
+
----------------
t.p.northover wrote:
> I think these tests are rather weak given how much code is changing and the complexity of the result. I'd probably suggest a MIR test for this one; they can be a bit temperamental to get started, but allow you to exercise many more edge cases quickly.
I fully agree with you. I tried to write a simpler test but I failed to write it in IR. How can I write in MIR?


http://reviews.llvm.org/D18838