[PATCH] D16353: [mips] MIPS32R6 compact branch support

Tue Jan 26 02:59:20 PST 2016

sdardis added inline comments.

================
Comment at: lib/Target/Mips/MipsDelaySlotFiller.cpp:595-659
@@ -566,2 +594,67 @@
 
+// The forbidden slot is the instruction immediately following a compact
+// branch. A forbidden slot hazard occurs when a compact branch instruction
+// is executed and the adjacent instruction in memory is a control transfer
+// instruction such as a branch or jump, ERET, ERETNC, DERET, WAIT and
+// PAUSE.
+//
+// In such cases, the processor is required to signal a Reserved Instruction
+// exception. Forbidden slot hazards are defined for MIPSR6, no microMIPS.
+//
+// There are three sources of forbidden slot hazards:
+//
+// A) Transforming a delay slot branch into compact branch.
+// B) A previous pass has created a compact branch directly.
+// C) Filling a delay slot using a backwards search when the instruction
+//    moved was in a forbidden slot. This case will create hazards in already
+//    processed code.
+//
+bool Filler::clearFSHazards(MachineFunction &F) {
+  bool Changed = false;
+  const MipsSubtarget &STI = F.getSubtarget<MipsSubtarget>();
+  const MipsSEInstrInfo *TII =
+      static_cast<const MipsSEInstrInfo *>(STI.getInstrInfo());
+
+  for (MachineFunction::iterator FI = F.begin(), FE = F.end(); FI != FE; ++FI) {
+    for (Iter I = (*FI).begin(); I != (*FI).end(); ++I) {
+      if (hasForbiddenSlot(I)) {
+        if (std::next(I) != (*FI).end() &&
+            !safeInForbiddenSlot(&*std::next(I))) {
+          BuildMI((*FI), std::next(I), I->getDebugLoc(), TII->get(Mips::NOP));
+          Changed = true;
+        } else {
+          for (auto *Succ : (*FI).successors()) {
+            if ((*FI).isLayoutSuccessor(Succ) &&
+                (Succ->getFirstNonDebugInstr()) != Succ->end() &&
+                !safeInForbiddenSlot((Succ->getFirstNonDebugInstr()))) {
+              BuildMI(&(*FI), I->getDebugLoc(), TII->get(Mips::NOP));
+              Changed = true;
+            }
+          }
+        }
+      }
+    }
+  }
+  return Changed;
+}
+
+bool Filler::runOnMachineFunction(MachineFunction &F) {
+  bool Changed = false;
+  for (MachineFunction::iterator FI = F.begin(), FE = F.end(); FI != FE; ++FI)
+    Changed |= runOnMachineBasicBlock(*FI);
+
+  // Make a second pass over the instructions to clear hazards.
+  const MipsSubtarget &STI = F.getSubtarget<MipsSubtarget>();
+  if (STI.hasMips32r6() && !STI.inMicroMipsMode())
+    Changed |= clearFSHazards(F);
+
+  // This pass invalidates liveness information when it reorders
+  // instructions to fill delay slot. Without this, -verify-machineinstrs
+  // will fail.
+  if (Changed)
+    F.getRegInfo().invalidateLiveness();
+
+  return Changed;
+}
+
 /// runOnMachineBasicBlock - Fill in delay slots for the given basic block.
----------------
vkalintiris wrote:
> There's no need to iterate over every instruction twice. We can check whether we have to add a NOP inside `replaceWithCompactBranch()` and bundle it together with the compact branch.
> We can check whether we have to add a NOP inside replaceWithCompactBranch() and bundle it together with the compact branch.

We can't. You've missed case C in the comment.

Consider the following from function l() in the test case before delay slot filling (using assembly here to keep things clear):

        move   $16, $2
        jal      j
        bne    $16, $2, $BB0_2
  # BB#1:                                 # %if.then
        addiu   $4, $zero, -2
        jal     f

For the first jal, the delay slot filler will insert the 'move' into its delay slot. The delay slot filler has no instruction to put in the delay slot of bne, so it transforms it into bnec. No nop will be inserted as a) there is no CTI following bnec in that basic block, and b) the first non-debug instruction in the physically following basic block is not a CTI. 

The delay filler then considers BB#1 and schedules the addiu into the delay slot of the jal. The delay filler has now created a forbidden slot hazard. Handling this in the delay filler itself requires identifying such special cases, then inserting a nop into the basic block of the compact branch (which could be the previous basic block to the one we're working on).

This is in addition to cases A and B described in the comment. My reasoning for processing all the instructions again is that it trivializes all those cases (and possibly others) into a 'is the physical successor instruction of a compact branch a CTI?' examination rather than inserting checks where ever we could create a FS hazard. Inserting handling FS hazard logic in multiple places I think could lead to correctness issues as we'd need to ensure we cover all possible cases, rather than a simple (mini-)pass decoupled from the delay slot filling logic.

Thoughts?

================
Comment at: lib/Target/Mips/MipsInstrInfo.cpp:287
@@ +286,3 @@
+  if (I->isBranch() && I->getNumExplicitOperands() == 3 &&
+      (unsigned)I->getOperand(1).getReg() == Mips::ZERO) {
+    ZeroOperandBranch = true;
----------------
vkalintiris wrote:
> You can use MipsABIInfo::GetZeroReg() to test with the right zero register. Also, the cast is unnecessary.
> You can use MipsABIInfo::GetZeroReg() to test with the right zero register.

Strangely enough, this doesn't work. 

MipsTargeLowering::emitAtomicBinary will generate a Mips::BEQ with Mips::ZERO as an operand on mips64 if the requested size is 4. For mips64 it appears we need to check for MipsABIInfo::GetZeroReg() and Mips::ZERO for the moment.

This appears to be an outstanding issue with emitAtomicBinary.

================
Comment at: lib/Target/Mips/MipsSEInstrInfo.cpp:438
@@ +437,3 @@
+MipsSEInstrInfo::getEquivalentCompactForm(MachineBasicBlock::iterator I) const {
+
+  const MipsSubtarget &STI =
----------------
vkalintiris wrote:
> Redundant newline.
clang-format inserts the newline, leave or keep it?


http://reviews.llvm.org/D16353