[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Tue Aug 29 11:17:54 PDT 2017

Puyan Lotfi via llvm-dev <llvm-dev at lists.llvm.org> writes:
> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
>
> Problem Statement and Context:
>
> Diff-tools are regularly used for comparing IR and assembly files. This often
> involves reasoning through differences that are semantically equivalent and it
> can be very time consuming for a person to do said reasoning.
>
> Specifically in the context of GlobalISel development there are problems of
> correctness verification. There is a need to compare two programs, compiled
> from identical IR by two different instruction selectors in a way where the
> true differences stand out.
>
> Proposal:
>
> We propose a new tool that we have tentatively named 'mir-canon' that performs
> canonical transformations on MIR. The goal is for MIR pre-processed with
> mir-canon to show fewer differences than if it were not pre-processed.

This sounds great. One place I'm hoping to use this for is in ISel
fuzzing - we can currently catch crashes and machine verifier errors,
but comparing SDAG to GlobalISel is a space that I'd really like to
explore.

> At the time of this writing we have a prototype canonicalization tool. We’ve
> come up with some techniques that show promise and would like to open
> discussion with the community to get feedback and suggestions on refining said
> techniques. Currently we think of this as an open ended project.
>
> Techniques:
>
> Our prototype does the following for each basic block in a Reverse Post
> Ordering:
>
>   * Canonical instruction ordering is done by moving a given instruction as
>     close to the nearest use of its def as possible.
>
>   * Next, canonical VReg renaming is done by building a collection of
>     candidate instructions that can be thought of as sinks in the def-use
>     graph: they are typically instructions that write to physical registers or
>     store to memory. These candidates are used as the root of a breadth first
>     walk over the vreg operand def-use graph that determines a canonical vreg
>     ordering.
>
>   * Using said canonical vreg ordering we rename monotonically, but before we
>     do this we skip several vreg values in order to increase the chance that
>     we land on the same vreg number for two different input MIR files.

This is one place where handling multiple files at once would help, correct?

>     We also do this to reduce the chances that a difference in
>     previously def-use walks will affect the vreg renaming for
>     subsequent walks. This skipping step could be thought of as a kind
>     of vreg number reckoning: we skip modulo n vregs so that we are
>     likely to land on the same vreg for two different files.
>
> This approach is completely agnostic of ISA specific semantics so it should
> work for any target.
>
> Current status:
>
> At the moment we have a standalone llvm tool that uses a single pass to do
> the above described transformations. We have test inputs that show promise
> but we still need a wider set of tests as well as hard metrics.

This sounds like a great start, but I feel like this design won't quite
scale to all the ways we want to use this.

Consider, if this was a pass in the LLVM library rather than only
accessible in a tool, you could potentially set it up to run at the end
of an `llc` run. Similarly, if I want to use this in a fuzzer it would
be nice to be able to do the canonicalization in memory.

> Our approach processes a single file at a time. The primary benefit to this
> approach is lower complexity in initial implementation and exploration of
> building such a tool. We are open to other approaches such as an llvm-diff
> like (two file at a time) approach, but we have not explored that avenue
> fully yet.
>
> We’re eager to hear community feedback and will be ready to share patches
> for review soon.
 ...
> Patch for review.

Please send the patch to llvm-commits (and refer back to this thread).
Patch review is too much traffic for llvm-dev.

> diff --git a/tools/mir-canon/CMakeLists.txt b/tools/mir-canon/CMakeLists.txt
> new file mode 100644
> index 00000000000..eb6722e4bd4
> --- /dev/null
> +++ b/tools/mir-canon/CMakeLists.txt
> @@ -0,0 +1,28 @@
> +set(LLVM_LINK_COMPONENTS
> +  ${LLVM_TARGETS_TO_BUILD}
> +  Analysis
> +  AsmPrinter
> +  CodeGen
> +  Core
> +  IRReader
> +  MC
> +  MIRParser
> +  ScalarOpts
> +  SelectionDAG
> +  Support
> +  Target
> +  TransformUtils
> +  Vectorize
> +  )
> +
> +# Support plugins.
> +set(LLVM_NO_DEAD_STRIP 1)
> +
> +add_llvm_tool(mir-canon
> +  mir-canon.cpp
> +  MIRCanonicalizerPass.cpp
> +
> +  DEPENDS
> +  intrinsics_gen
> +  )
> +export_executable_symbols(mir-canon)
> diff --git a/tools/mir-canon/Common.h b/tools/mir-canon/Common.h
> new file mode 100644
> index 00000000000..3e6be7ef643
> --- /dev/null
> +++ b/tools/mir-canon/Common.h
> @@ -0,0 +1,21 @@
> +#include <cassert>
> +
> +namespace llvm {
> +/// The purpose of this pass is to establish a canonical operand naming so
> +/// that code compiled with slightly different IR passes can be diffed more
> +/// effectively than otherwise. This is done by renaming vregs in a given
> +/// LiveRange in a canonical way.
> +extern char &MIRCanonicalizerID;
> +
> +void initializeMIRCanonicalizerPass(PassRegistry &);
> +
> +} // namespace llvm
> +
> +#define CANON_CHECK(CHK, MSG) \
> +  do { \
> +    if ((CHK)) break; \
> +    errs() << (std::string("") + MSG) << '\n'; \
> +    assert(false && "CANON_CHECK FAILURE."); \
> +    exit(EXIT_FAILURE); \
> +  } while (false)
> +
> diff --git a/tools/mir-canon/LLVMBuild.txt b/tools/mir-canon/LLVMBuild.txt
> new file mode 100644
> index 00000000000..5d3e6bb5a91
> --- /dev/null
> +++ b/tools/mir-canon/LLVMBuild.txt
> @@ -0,0 +1,22 @@
> +;===- ./tools/mir-canon/LLVMBuild.txt --------------------------*- Conf -*--===;
> +;
> +;                     The LLVM Compiler Infrastructure
> +;
> +; This file is distributed under the University of Illinois Open Source
> +; License. See LICENSE.TXT for details.
> +;
> +;===------------------------------------------------------------------------===;
> +;
> +; This is an LLVMBuild description file for the components in this subdirectory.
> +;
> +; For more information on the LLVMBuild system, please see:
> +;
> +;   http://llvm.org/docs/LLVMBuild.html
> +;
> +;===------------------------------------------------------------------------===;
> +
> +[component_0]
> +type = Tool
> +name = mir-canon
> +parent = Tools
> +required_libraries = AsmParser BitReader IRReader MIRParser TransformUtils Scalar Vectorize all-targets
> diff --git a/tools/mir-canon/MIRCanonicalizerPass.cpp b/tools/mir-canon/MIRCanonicalizerPass.cpp
> new file mode 100644
> index 00000000000..a4a0a7a0e5f
> --- /dev/null
> +++ b/tools/mir-canon/MIRCanonicalizerPass.cpp
> @@ -0,0 +1,588 @@
> +//===-------------- MIRCanonicalizer.cpp - MIR Canonicalizer --------------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// Reorders instructions canonically.
> +// Renames virtual register operands canonically.
> +// Strips certain MIR artifacts (optionally).
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#include "llvm/CodeGen/Passes.h"
> +#include "llvm/CodeGen/MachineInstrBuilder.h"
> +#include "llvm/CodeGen/MachineFunctionPass.h"
> +#include "llvm/CodeGen/MachineRegisterInfo.h"
> +
> +#include "llvm/Support/raw_ostream.h"
> +#include "llvm/ADT/PostOrderIterator.h"
> +#include "llvm/Target/TargetInstrInfo.h"
> +
> +#include <tuple>
> +#include <queue>
> +
> +using namespace llvm;
> +
> +#include "Common.h"
> +
> +#define DEBUG_TYPE "mir-canonicalizer"
> +
> +static cl::opt<unsigned>
> +CanonicalizeFunctionNumber("canon-nth-function", cl::Hidden, cl::init(~0u),
> +                           cl::value_desc("N"),
> +                           cl::desc("Function number to canonicalize."));
> +
> +static cl::opt<unsigned>
> +CanonicalizeBasicBlockNumber("canon-nth-basicblock", cl::Hidden, cl::init(~0u),
> +                             cl::value_desc("N"),
> +                             cl::desc("BasicBlock number to canonicalize."));
> +
> +namespace {
> +
> +class MIRCanonicalizer : public MachineFunctionPass {
> +public:
> +  static char ID;
> +  MIRCanonicalizer() : MachineFunctionPass(ID) {}
> +
> +  StringRef getPassName() const override {
> +    return "Rename register operands in a canonical ordering.";
> +  }
> +
> +  void getAnalysisUsage(AnalysisUsage &AU) const override {
> +    AU.setPreservesCFG();
> +    MachineFunctionPass::getAnalysisUsage(AU);
> +  }
> +
> +  bool runOnMachineFunction(MachineFunction &MF) override;
> +};
> +
> +} // end anonymous namespace
> +
> +enum RSType { RSE_Reg = 0, RSE_FrameIndex, RSE_NewCandidate };
> +typedef std::tuple<RSType, unsigned> typedRegStackEntry;
> +
> +char MIRCanonicalizer::ID;
> +
> +char &llvm::MIRCanonicalizerID = MIRCanonicalizer::ID;
> +
> +INITIALIZE_PASS_BEGIN(MIRCanonicalizer, "mir-canonicalizer",
> +                      "Rename Register Operands Canonically", false, false);
> +
> +INITIALIZE_PASS_END(MIRCanonicalizer, "mir-canonicalizer",
> +                    "Rename Register Operands Canonically", false, false);
> +
> +static std::vector<MachineBasicBlock *> GetRPOList(MachineFunction &MF) {
> +  MachineBasicBlock *Entry = &*MF.begin();
> +  std::vector<MachineBasicBlock*> RPOList;
> +  ReversePostOrderTraversal<MachineBasicBlock*> RPOT(Entry);
> +  for (ReversePostOrderTraversal<MachineBasicBlock*>::rpo_iterator
> +       MBBI = RPOT.begin(), MBBE = RPOT.end(); MBBI != MBBE; ++MBBI) {
> +    MachineBasicBlock *MBB = *MBBI;
> +    RPOList.push_back(MBB);
> +  }
> +
> +  return RPOList;
> +}
> +
> +// Set a dummy vreg. We use this vregs register class to generate throw-away
> +// vregs that are used to skip vreg numbers so that vreg numbers line up.
> +static unsigned GetDummyVReg(const MachineFunction &MF) {
> +  for (auto &MBB : MF) {
> +    for (auto II = MBB.begin(), IE = MBB.end(); II != IE; ++II) {
> +      const MachineInstr *MI = &*II;
> +      for (unsigned i = 0; i < MI->getNumOperands(); i++) {
> +        const MachineOperand &MO = MI->getOperand(i);
> +        if (!MO.isReg() || !TargetRegisterInfo::isVirtualRegister(MO.getReg()))
> +          continue;
> +        return MO.getReg();
> +      }
> +    }
> +  }
> +
> +  return ~0U;
> +}
> +
> +static bool rescheduleCanoncally(MachineBasicBlock *MBB) {
> +
> +  bool Changed = false;
> +
> +  // Calculates the distance of MI from the begining of its parent BB.
> +  auto getInstrIdx = [](const MachineInstr &MI) {
> +    unsigned i = 0;
> +    for (auto &I : *MI.getParent()) {
> +      if (&I == &MI)
> +        return i;
> +      i++;
> +    }
> +    return ~0U;
> +  };
> +
> +  // Pre-Populate vector of instructions to reschedule so that we don't
> +  // clobber the iterator.
> +  std::vector<MachineInstr *> instructions;
> +  for (auto II = MBB->begin(); II != MBB->end(); ++II) {
> +    instructions.push_back(&*II);
> +  }
> +
> +  for (auto II : instructions) {
> +    if (II->getNumOperands() == 0)
> +      continue;
> +
> +    MachineOperand &MO = II->getOperand(0);
> +    if (!MO.isReg() || !TargetRegisterInfo::isVirtualRegister(MO.getReg()))
> +      continue;
> +
> +    DEBUG(dbgs() << "Operand " << 0 << " of "; II->dump(); MO.dump(););
> +
> +    MachineInstr *def = II;
> +    unsigned distance = ~0U;
> +    MachineInstr *useToBringDefCloserTo = nullptr;
> +    MachineRegisterInfo *MRI = &MBB->getParent()->getRegInfo();
> +    for (auto UI = MRI->use_begin(MO.getReg()), UE = MRI->use_end();
> +         UI != UE; ++UI) {
> +      MachineInstr *useInst = UI->getParent();
> +
> +      const unsigned defLoc = getInstrIdx(*def);
> +      const unsigned useLoc = getInstrIdx(*useInst);
> +      const unsigned delta = (useLoc - defLoc);
> +
> +      if (useInst->getParent() != def->getParent())
> +        continue;
> +      if (defLoc >= useLoc)
> +        continue;
> +
> +      if (delta < distance) {
> +        distance = delta;
> +        useToBringDefCloserTo = useInst;
> +      }
> +    }
> +
> +    MachineBasicBlock::iterator defI = MBB->instr_end();
> +    MachineBasicBlock::iterator useI = MBB->instr_end();
> +
> +    for (auto BBI = MBB->instr_begin(); BBI != MBB->instr_end(); ++BBI) {
> +
> +      if (defI != MBB->instr_end() && useI != MBB->instr_end())
> +        break;
> +
> +      if ((&*BBI != def) && (&*BBI != useToBringDefCloserTo))
> +        continue;
> +
> +      if (&*BBI == def) {
> +        defI = BBI;
> +        continue;
> +      }
> +
> +      if (&*BBI == useToBringDefCloserTo) {
> +        useI = BBI;
> +        continue;
> +      }
> +    }
> +
> +    if (defI == MBB->instr_end() || useI == MBB->instr_end())
> +      continue;
> +
> +    DEBUG(dbgs() << "Splicing "; defI->dump(); dbgs() << " right before: ";
> +          useI->dump(););
> +
> +    Changed = true;
> +    MBB->splice(useI, MBB, defI);
> +  }
> +
> +  return Changed;
> +}
> +
> +static std::vector<MachineInstr *> populateCandidates(MachineBasicBlock *MBB) {
> +  std::vector<MachineInstr *> candidates;
> +  MachineRegisterInfo &MRI = MBB->getParent()->getRegInfo();
> +
> +  for (auto II = MBB->begin(), IE = MBB->end(); II != IE; ++II) {
> +    MachineInstr *MI = &*II;
> +
> +    bool doesMISideEffect = false;
> +
> +    if (MI->getOperand(0).isReg()) {
> +      const unsigned dst = MI->getOperand(0).getReg();
> +      doesMISideEffect |= !TargetRegisterInfo::isVirtualRegister(dst);
> +
> +      for (auto UI = MRI.use_begin(dst); UI != MRI.use_end(); ++UI) {
> +        if (doesMISideEffect) break;
> +        doesMISideEffect |= (UI->getParent()->getParent() != MI->getParent());
> +      }
> +    }
> +
> +    if (!MI->mayStore() && !MI->isBranch() && !doesMISideEffect)
> +      continue;
> +
> +    DEBUG(dbgs() << "Found Candidate:  "; MI->dump(););
> +    candidates.push_back(MI);
> +  }
> +
> +  return candidates;
> +}
> +
> +void doCandidateWalk(std::vector<typedRegStackEntry> &VRegs,
> +                     std::queue <typedRegStackEntry> &reg_queue,
> +                     std::vector<MachineInstr *> &visitedMIs,
> +                     const MachineBasicBlock *MBB) {
> +
> +  const MachineFunction &MF = *MBB->getParent();
> +  const MachineRegisterInfo &MRI = MF.getRegInfo();
> +
> +  while (reg_queue.size()) {
> +    auto regType = std::get<0>(reg_queue.front());
> +    auto reg = std::get<1>(reg_queue.front());
> +    reg_queue.pop();
> +
> +    if (regType == RSE_FrameIndex) {
> +      DEBUG(dbgs() << "Popping frame index.\n";);
> +      VRegs.push_back(std::make_tuple(RSE_FrameIndex, ~0U));
> +      continue;
> +    }
> +
> +    assert(regType == RSE_Reg && "Expected vreg or physreg.");
> +
> +    if (TargetRegisterInfo::isVirtualRegister(reg)) {
> +      DEBUG(dbgs() << "Popping vreg ";
> +          MRI.def_begin(reg)->dump();
> +          dbgs() << "\n";);
> +
> +      bool hasVisitedVReg = false;
> +      for (auto vreg : VRegs) {
> +        if (std::get<1>(vreg) == reg) {
> +          hasVisitedVReg = true;
> +          break;
> +        }
> +      }
> +
> +      if (!hasVisitedVReg)
> +        VRegs.push_back(std::make_tuple(RSE_Reg, reg));
> +    } else {
> +      DEBUG(dbgs() << "Popping physreg.\n";);
> +      VRegs.push_back(std::make_tuple(RSE_Reg, reg));
> +      continue;
> +    }
> +
> +    for (auto RI = MRI.def_begin(reg), RE = MRI.def_end(); RI != RE; ++RI) {
> +      MachineInstr *def = RI->getParent();
> +
> +      if (def->getParent() != MBB)
> +        continue;
> +
> +      bool hasVisited = false;
> +      for (auto VMI : visitedMIs) {
> +        if (def == VMI) {
> +          hasVisited = true;
> +          break;
> +        }
> +      }
> +
> +      if (hasVisited)
> +        break;
> +
> +      DEBUG(dbgs() << "\n========================\n";
> +          dbgs() << "Visited MI: "; def->dump();
> +          dbgs() << "BB Name: " << def->getParent()->getName() << "\n";
> +          dbgs() << "\n========================\n";);
> +      visitedMIs.push_back(def);
> +      for (unsigned I = 1, E = def->getNumOperands(); I != E; ++I) {
> +
> +        MachineOperand &MO = def->getOperand(I);
> +        if (MO.isFI()) {
> +          DEBUG(dbgs() << "Pushing frame index.\n";);
> +          reg_queue.push(std::make_tuple(RSE_FrameIndex, ~0U));
> +        }
> +
> +        if (!MO.isReg())
> +          continue;
> +        reg_queue.push(std::make_tuple(RSE_Reg, MO.getReg()));
> +      }
> +    }
> +  }
> +}
> +
> +static void SkipVRegs(unsigned &VRegGapIndex, MachineRegisterInfo &MRI,
> +                      const TargetRegisterClass *RC) {
> +  const unsigned VR_GAP = (++VRegGapIndex * 1000);
> +
> +  DEBUG(dbgs() << "Adjusting per-BB VR_GAP for BB" << VRegGapIndex << " to "
> +               << VR_GAP << "\n";);
> +
> +  unsigned I = MRI.createVirtualRegister(RC);
> +  const unsigned E = (((I + VR_GAP) / VR_GAP) + 1) * VR_GAP;
> +  while (I != E) {
> +    I = MRI.createVirtualRegister(RC);
> +  }
> +
> +  CANON_CHECK(I != ~0U && I != 0, "Invalid VReg");
> +}
> +
> +static void GetVRegRenameMap(std::map<unsigned, unsigned> &vregRenameMap,
> +                             const std::vector<typedRegStackEntry> &VRegs,
> +                             const std::vector<unsigned> &renamedInOtherBB,
> +                             MachineRegisterInfo &MRI,
> +                             const TargetRegisterClass *RC) {
> +
> +  unsigned lastRenameReg = MRI.createVirtualRegister(RC);
> +
> +  bool firstCandidate = true;
> +  for (auto vreg : VRegs) {
> +
> +    auto regType = std::get<0>(vreg);
> +    auto reg = std::get<1>(vreg);
> +
> +    if (regType == RSE_FrameIndex) {
> +      lastRenameReg = MRI.createVirtualRegister(RC);
> +      DEBUG(dbgs() << "Skipping rename for FI " << lastRenameReg << "\n";);
> +      continue;
> +    } else if (regType == RSE_NewCandidate) {
> +      if (!firstCandidate) {
> +        while (lastRenameReg % 10) {
> +          lastRenameReg = MRI.createVirtualRegister(RC);
> +
> +          DEBUG(dbgs() << "Skipping rename for new candidate " << lastRenameReg
> +                       << "\n";);
> +        }
> +      }
> +      firstCandidate = false;
> +      continue;
> +    } else if (!TargetRegisterInfo::isVirtualRegister(reg)) {
> +      lastRenameReg = MRI.createVirtualRegister(RC);
> +      DEBUG(dbgs() << "Skipping rename for Phys Reg " << lastRenameReg << "\n";);
> +      continue;
> +    }
> +
> +    if (std::find(renamedInOtherBB.begin(), renamedInOtherBB.end(), reg) !=
> +        renamedInOtherBB.end()) {
> +      DEBUG(dbgs() << "Vreg " << reg << " already renamed in other BB.\n";);
> +      continue;
> +    }
> +
> +    auto rename = MRI.createVirtualRegister(MRI.getRegClass(reg));
> +    lastRenameReg = rename;
> +
> +    if (vregRenameMap.find(reg) == vregRenameMap.end()) {
> +      DEBUG(dbgs() << "Mapping vreg ";);
> +      if (MRI.reg_begin(reg) != MRI.reg_end()) {
> +        DEBUG(auto foo = &*MRI.reg_begin(reg); foo->dump(););
> +      } else {
> +        DEBUG(dbgs() << reg;);
> +      }
> +      DEBUG(dbgs() << " to ";);
> +      if (MRI.reg_begin(rename) != MRI.reg_end()) {
> +        DEBUG(auto foo = &*MRI.reg_begin(rename); foo->dump(););
> +      } else {
> +        DEBUG(dbgs() << rename;);
> +      }
> +      DEBUG(dbgs() << "\n";);
> +
> +      vregRenameMap.insert(std::pair<unsigned, unsigned>(reg, rename));
> +    }
> +  }
> +}
> +
> +static bool doVRegRenaming(std::vector<unsigned> &renamedInOtherBB,
> +                           const std::map<unsigned, unsigned> &vregRenameMap,
> +                           MachineRegisterInfo &MRI) {
> +  bool Changed = false;
> +  for (auto I = vregRenameMap.begin(), E = vregRenameMap.end(); I != E; ++I) {
> +
> +    auto vreg = I->first;
> +    auto rename = I->second;
> +
> +    renamedInOtherBB.push_back(rename);
> +
> +    std::vector<MachineOperand *> renameMOs;
> +    for (MachineOperand &MO : MRI.reg_operands(vreg)) {
> +      renameMOs.push_back(&MO);
> +    }
> +
> +    for (auto MO : renameMOs) {
> +      Changed = true;
> +      MO->setReg(rename);
> +
> +      if (!MO->isDef())
> +        MO->setIsKill(false);
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +static bool doDefKillClear(MachineBasicBlock *MBB) {
> +  bool Changed = false;
> +
> +  for (auto II = MBB->begin(), IE = MBB->end(); II != IE; ++II) {
> +    MachineInstr *MI = &*II;
> +
> +    for (unsigned i = 0; i < MI->getNumOperands(); i++) {
> +      MachineOperand &MO = MI->getOperand(i);
> +      if (!MO.isReg())
> +        continue;
> +      if (!MO.isDef() && MO.isKill()) {
> +        Changed = true;
> +        MO.setIsKill(false);
> +      }
> +
> +      if (MO.isDef() && MO.isDead()) {
> +        Changed = true;
> +        MO.setIsDead(false);
> +      }
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +static bool runOnBasicBlock(MachineBasicBlock *MBB,
> +                            std::vector<StringRef> &bbNames,
> +                            std::vector<unsigned> &renamedInOtherBB,
> +                            unsigned &basicBlockNum, unsigned &VRegGapIndex) {
> +
> +  if (CanonicalizeBasicBlockNumber != ~0U) {
> +    if (CanonicalizeBasicBlockNumber != basicBlockNum++)
> +      return false;
> +    DEBUG(dbgs() << "\n Canonicalizing BasicBlock " << MBB->getName() << "\n";);
> +  }
> +
> +  if (std::find(bbNames.begin(), bbNames.end(), MBB->getName()) !=
> +      bbNames.end()) {
> +    DEBUG(dbgs() << "Found potentially duplicate BasicBlocks: "
> +                 << MBB->getName() << "\n";);
> +    return false;
> +  }
> +
> +  DEBUG(
> +    dbgs() << "\n\n  NEW BASIC BLOCK: " << MBB->getName() << "  \n\n";
> +    dbgs() << "\n\n================================================\n\n";
> +  );
> +
> +  bool Changed = false;
> +  MachineFunction &MF = *MBB->getParent();
> +  MachineRegisterInfo &MRI = MF.getRegInfo();
> +
> +  const TargetRegisterClass *dummyRC = [] (const MachineFunction &MF) {
> +    const unsigned dummyVReg = GetDummyVReg(MF);
> +    CANON_CHECK(dummyVReg != ~0U, "Could not find a dummy VReg.");
> +    const MachineRegisterInfo &MRI = MF.getRegInfo();
> +    return MRI.getRegClass(dummyVReg);
> +  } (MF);
> +
> +  bbNames.push_back(MBB->getName());
> +  DEBUG(dbgs() << "\n\n NEW BASIC BLOCK: " << MBB->getName() << "\n\n";);
> +
> +  DEBUG(dbgs() << "MBB Before Scheduling:\n"; MBB->dump(););
> +  Changed |= rescheduleCanoncally(MBB);
> +  DEBUG(dbgs() << "MBB After Scheduling:\n"; MBB->dump(););
> +
> +  std::vector<MachineInstr *> candidates = populateCandidates(MBB);
> +  std::vector<MachineInstr *> visitedMIs;
> +  std::copy(candidates.begin(), candidates.end(),
> +            std::back_inserter(visitedMIs));
> +
> +  std::vector<typedRegStackEntry> VRegs;
> +  for (auto candidate : candidates) {
> +    VRegs.push_back(std::make_tuple(RSE_NewCandidate, ~0U));
> +
> +    std::queue<typedRegStackEntry> reg_queue;
> +
> +    // Here we walk the vreg operands of a non-root node along our walk.
> +    // The root nodes are the original candidates (stores normally).
> +    // These are normally not the root nodes (except for the case of copies to
> +    // physical registers).
> +    for (unsigned i = 1; i < candidate->getNumOperands(); i++) {
> +      if (candidate->mayStore() || candidate->isBranch())
> +        break;
> +
> +      MachineOperand &MO = candidate->getOperand(i);
> +      if (!(MO.isReg() && TargetRegisterInfo::isVirtualRegister(MO.getReg())))
> +        continue;
> +
> +      DEBUG(dbgs() << "Enqueue register"; MO.dump(); dbgs() << "\n";);
> +      reg_queue.push(std::make_tuple(RSE_Reg, MO.getReg()));
> +    }
> +
> +    // Here we walk the root candidates. We start from the 0th operand because
> +    // the root is normally a store to a vreg.
> +    for (unsigned i = 0; i < candidate->getNumOperands(); i++) {
> +
> +      if (!candidate->mayStore() && !candidate->isBranch())
> +        break;
> +
> +      MachineOperand &MO = candidate->getOperand(i);
> +
> +      // TODO: Do we want to only add vregs here?
> +      if (!MO.isReg() && !MO.isFI())
> +        continue;
> +
> +      DEBUG(dbgs() << "Enqueue Reg/FI"; MO.dump(); dbgs() << "\n";);
> +
> +      reg_queue.push(MO.isReg() ? std::make_tuple(RSE_Reg, MO.getReg()) :
> +                                  std::make_tuple(RSE_FrameIndex,  ~0U));
> +    }
> +
> +    // Now that we have pushed the operands of the candidate here, we do the
> +    // full breadth first walk.
> +    doCandidateWalk(VRegs, reg_queue, visitedMIs, MBB);
> +  }
> +
> +  // If we have populated no vregs to rename then bail.
> +  // The rest of this function does the vreg remaping.
> +  if (VRegs.size() == 0)
> +    return Changed;
> +
> +  // Skip some vregs, so we can recon where we'll land next.
> +  SkipVRegs(VRegGapIndex, MRI, dummyRC);
> +
> +  std::map<unsigned, unsigned> vregRenameMap;
> +  GetVRegRenameMap(vregRenameMap, VRegs, renamedInOtherBB, MRI, dummyRC);
> +
> +  Changed |= doVRegRenaming(renamedInOtherBB, vregRenameMap, MRI);
> +  Changed |= doDefKillClear(MBB);
> +
> +  DEBUG(dbgs() << "Updated MachineBasicBlock:\n"; MBB->dump(); dbgs() << "\n";);
> +  DEBUG(dbgs() << "\n\n================================================\n\n");
> +  return Changed;
> +}
> +
> +bool MIRCanonicalizer::runOnMachineFunction(MachineFunction &MF) {
> +
> +  static unsigned functionNum = 0;
> +  if (CanonicalizeFunctionNumber != ~0U) {
> +    if (CanonicalizeFunctionNumber != functionNum++)
> +      return false;
> +    DEBUG(dbgs() << "\n Canonicalizing Function " << MF.getName() << "\n";);
> +  }
> +
> +  // we need a valid vreg to create a vreg type for skipping all those
> +  // stray vreg numbers so reach alignment/canonical vreg values.
> +  std::vector<MachineBasicBlock*> RPOList = GetRPOList(MF);
> +
> +  DEBUG(
> +    dbgs() << "\n\n  NEW MACHINE FUNCTION: " << MF.getName() << "  \n\n";
> +    dbgs() << "\n\n================================================\n\n";
> +    dbgs() << "Total Basic Blocks: " << RPOList.size() << "\n";
> +    for (auto MBB : RPOList) {
> +      dbgs() << MBB->getName() << "\n";
> +    }
> +    dbgs() << "\n\n================================================\n\n";
> +  );
> +
> +  std::vector<StringRef> bbNames;
> +  std::vector<unsigned> renamedInOtherBB;
> +
> +  unsigned gapIdx = 0;
> +  unsigned bbNum = 0;
> +
> +  bool Changed = false;
> +
> +  for (auto MBB : RPOList)
> +    Changed |= runOnBasicBlock(MBB, bbNames, renamedInOtherBB, bbNum, gapIdx);
> +
> +  return Changed;
> +}
> +
> diff --git a/tools/mir-canon/mir-canon.cpp b/tools/mir-canon/mir-canon.cpp
> new file mode 100644
> index 00000000000..bfe53c01906
> --- /dev/null
> +++ b/tools/mir-canon/mir-canon.cpp
> @@ -0,0 +1,175 @@
> +//===------ mir-canon.cpp - Implement the LLVM Native Code Generator ------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// This is the mir-canon driver. It invokes the MIR canonicalization pass.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#include "llvm/CodeGen/CommandFlags.h"
> +#include "llvm/CodeGen/TargetPassConfig.h"
> +#include "llvm/CodeGen/MachineModuleInfo.h"
> +#include "llvm/CodeGen/MachineFunctionPass.h"
> +#include "llvm/CodeGen/MIRParser/MIRParser.h"
> +
> +#include "llvm/IR/LLVMContext.h"
> +#include "llvm/IR/DiagnosticInfo.h"
> +#include "llvm/IR/IRPrintingPasses.h"
> +#include "llvm/IR/DiagnosticPrinter.h"
> +#include "llvm/IR/LegacyPassManager.h"
> +
> +#include "llvm/Support/Signals.h"
> +#include "llvm/Support/FileSystem.h"
> +#include "llvm/Support/TargetSelect.h"
> +#include "llvm/Support/TargetRegistry.h"
> +#include "llvm/Support/ToolOutputFile.h"
> +#include "llvm/Support/FormattedStream.h"
> +#include "llvm/Support/PrettyStackTrace.h"
> +
> +#include "llvm/Target/TargetMachine.h"
> +#include "llvm/Target/TargetSubtargetInfo.h"
> +
> +#include "Common.h"
> +
> +using namespace llvm;
> +
> +static cl::opt<std::string>
> +    InputFilename(cl::Positional, cl::desc("<input.mir>"), cl::init("-"));
> +
> +static cl::opt<std::string> OutputFilename("o", cl::desc("Output filename"),
> +                                           cl::value_desc("canon.mir"));
> +
> +static cl::opt<std::string> TargetTriple("mtriple",
> +                                         cl::desc("Override target triple."));
> +
> +int main(int argc, char **argv) {
> +  sys::PrintStackTraceOnErrorSignal(argv[0]);
> +  PrettyStackTraceProgram X(argc, argv);
> +
> +  EnableDebugBuffering = true;
> +
> +  LLVMContext Context;
> +  llvm_shutdown_obj Y;
> +
> +  // Initialize targets first, so that --version shows registered targets.
> +  InitializeAllTargets();
> +  InitializeAllTargetMCs();
> +  InitializeAllAsmPrinters();
> +  InitializeAllAsmParsers();
> +
> +  PassRegistry *Registry = PassRegistry::getPassRegistry();
> +  initializeCore(*Registry);
> +  initializeCodeGen(*Registry);
> +
> +  initializeMIRCanonicalizerPass(*Registry);
> +
> +  // Register the target printer for --version.
> +  cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);
> +
> +  cl::ParseCommandLineOptions(argc, argv, "llvm system compiler\n");
> +
> +  CANON_CHECK(StringRef(InputFilename).endswith_lower(".mir"),
> +              "Invalid Filename: " + InputFilename + ". " +
> +              "Expected a .mir file extension.");
> +
> +  // Set a diagnostic handler that doesn't exit on the first error
> +  bool HasError = false;
> +  Context.setDiagnosticHandler(
> +    [](const DiagnosticInfo &DI, void *Context) {
> +      if (DI.getSeverity() == DS_Error) {
> +        bool *HasError = static_cast<bool *>(Context);
> +        *HasError = true;
> +      }
> +
> +      auto *Remark = dyn_cast<DiagnosticInfoOptimizationBase>(&DI);
> +      if (Remark && !Remark->isEnabled())
> +        return;
> +
> +      DiagnosticPrinterRawOStream DP(errs());
> +      errs() << LLVMContext::getDiagnosticMessagePrefix(DI.getSeverity());
> +      errs() << ": ";
> +      DI.print(DP);
> +      errs() << "\n";
> +    },
> +    &HasError);
> +
> +  SMDiagnostic Err;
> +  auto MIR = createMIRParserFromFile(InputFilename, Err, Context);
> +  auto M = MIR ? MIR->parseIRModule() : nullptr;
> +
> +  CANON_CHECK(M, argv[0] + "Invalid Module or MIR.");
> +
> +  if (!TargetTriple.empty())
> +    M->setTargetTriple(Triple::normalize(TargetTriple));
> +
> +  Triple TheTriple = Triple(M->getTargetTriple());
> +  if (TheTriple.getTriple().empty())
> +    TheTriple.setTriple(sys::getDefaultTargetTriple());
> +
> +  // Get the target specific parser.
> +  std::string Error;
> +  const Target *TheTarget =
> +      TargetRegistry::lookupTarget(MArch, TheTriple, Error);
> +
> +  CANON_CHECK(TheTarget, argv[0] + ": " + Error);
> +
> +  std::string CPUStr = getCPUStr();
> +  std::string FeaturesStr = getFeaturesStr();
> +
> +  std::unique_ptr<TargetMachine> Target(TheTarget->createTargetMachine(
> +      TheTriple.getTriple(), CPUStr, FeaturesStr,
> +      InitTargetOptionsFromCodeGenFlags(), getRelocModel(), getCodeModel(),
> +      CodeGenOpt::Default));
> +
> +  CANON_CHECK(Target, "Could not allocate target machine!");
> +
> +  std::error_code EC;
> +  auto Out = llvm::make_unique<tool_output_file>(OutputFilename, EC,
> +                                                 sys::fs::F_Text);
> +  CANON_CHECK(!EC, EC.message());
> +  raw_pwrite_stream *OS = &Out->os();
> +
> +  legacy::PassManager PM;
> +  TargetLibraryInfoImpl TLII(Triple(M->getTargetTriple()));
> +  PM.add(new TargetLibraryInfoWrapperPass(TLII));
> +  M->setDataLayout(Target->createDataLayout());
> +  setFunctionAttributes(CPUStr, FeaturesStr, *M);
> +
> +  LLVMTargetMachine &TM = static_cast<LLVMTargetMachine &>(*Target);
> +  TargetPassConfig &TPC = *TM.createPassConfig(PM);
> +  PM.add(&TPC);
> +  MachineModuleInfo *MMI = new MachineModuleInfo(&TM);
> +  PM.add(MMI);
> +  TPC.printAndVerify("");
> +
> +  StringRef PassName = "mir-canonicalizer";
> +  const PassRegistry *PR = PassRegistry::getPassRegistry();
> +  const PassInfo *PI = PR->getPassInfo(PassName);
> +
> +  CANON_CHECK(PI, argv[0] + ": " + PI->getPassName() + " not registered.");
> +  CANON_CHECK(PI->getNormalCtor(),
> +              argv[0] + ": can't create pass: " + PI->getPassName());
> +
> +  Pass *P = PI->getNormalCtor()();
> +  PM.add(P);
> +  TPC.printAndVerify(std::string("After ") + std::string(P->getPassName()));
> +
> +  PM.add(createPrintMIRPass(*OS));
> +
> +  // Before executing passes, print the final values of the LLVM options.
> +  cl::PrintOptionValues();
> +
> +  PM.run(*M);
> +
> +  CANON_CHECK(!(*static_cast<bool *>(Context.getDiagnosticContext())),
> +              argv[0] + " Error: exiting...");
> +
> +  Out->keep();
> +  return 0;
> +}
> +
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev