[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Wed Aug 30 00:29:10 PDT 2017

It sounds quite useful. I wonder though why did you decide in favor of a stand-alone tool vs an LLVM pass that can be invoked under an option?  It seems to me using the llvm infrastructure should provide the means for the transformations the tool is performing. Maybe there is the sense that it is faster to implement (assuming this is what you mean by lower complexity) an off-line tool but there are benefits to leveraging llvm also.

I’m also curious about ways to link back difference sightings to actions in the compiler. Does this come down to follow ups “by hand”? Or do you plan to sync this with opt-viewer remarks? Or something else like “I (hopefully) know what to do when I see a ‘canonical’ diff"? Perhaps you can share some thoughts about it.

Cheers
Gerolf

> On Aug 29, 2017, at 11:17 AM, Justin Bogner via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Puyan Lotfi via llvm-dev <llvm-dev at lists.llvm.org> writes:
>> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
>> 
>> Problem Statement and Context:
>> 
>> Diff-tools are regularly used for comparing IR and assembly files. This often
>> involves reasoning through differences that are semantically equivalent and it
>> can be very time consuming for a person to do said reasoning.
>> 
>> Specifically in the context of GlobalISel development there are problems of
>> correctness verification. There is a need to compare two programs, compiled
>> from identical IR by two different instruction selectors in a way where the
>> true differences stand out.
>> 
>> Proposal:
>> 
>> We propose a new tool that we have tentatively named 'mir-canon' that performs
>> canonical transformations on MIR. The goal is for MIR pre-processed with
>> mir-canon to show fewer differences than if it were not pre-processed.
> 
> This sounds great. One place I'm hoping to use this for is in ISel
> fuzzing - we can currently catch crashes and machine verifier errors,
> but comparing SDAG to GlobalISel is a space that I'd really like to
> explore.
> 
>> At the time of this writing we have a prototype canonicalization tool. We’ve
>> come up with some techniques that show promise and would like to open
>> discussion with the community to get feedback and suggestions on refining said
>> techniques. Currently we think of this as an open ended project.
>> 
>> Techniques:
>> 
>> Our prototype does the following for each basic block in a Reverse Post
>> Ordering:
>> 
>>  * Canonical instruction ordering is done by moving a given instruction as
>>    close to the nearest use of its def as possible.
>> 
>>  * Next, canonical VReg renaming is done by building a collection of
>>    candidate instructions that can be thought of as sinks in the def-use
>>    graph: they are typically instructions that write to physical registers or
>>    store to memory. These candidates are used as the root of a breadth first
>>    walk over the vreg operand def-use graph that determines a canonical vreg
>>    ordering.
>> 
>>  * Using said canonical vreg ordering we rename monotonically, but before we
>>    do this we skip several vreg values in order to increase the chance that
>>    we land on the same vreg number for two different input MIR files.
> 
> This is one place where handling multiple files at once would help, correct?
> 
>>    We also do this to reduce the chances that a difference in
>>    previously def-use walks will affect the vreg renaming for
>>    subsequent walks. This skipping step could be thought of as a kind
>>    of vreg number reckoning: we skip modulo n vregs so that we are
>>    likely to land on the same vreg for two different files.
>> 
>> This approach is completely agnostic of ISA specific semantics so it should
>> work for any target.
>> 
>> Current status:
>> 
>> At the moment we have a standalone llvm tool that uses a single pass to do
>> the above described transformations. We have test inputs that show promise
>> but we still need a wider set of tests as well as hard metrics.
> 
> This sounds like a great start, but I feel like this design won't quite
> scale to all the ways we want to use this.
> 
> Consider, if this was a pass in the LLVM library rather than only
> accessible in a tool, you could potentially set it up to run at the end
> of an `llc` run. Similarly, if I want to use this in a fuzzer it would
> be nice to be able to do the canonicalization in memory.
> 
>> Our approach processes a single file at a time. The primary benefit to this
>> approach is lower complexity in initial implementation and exploration of
>> building such a tool. We are open to other approaches such as an llvm-diff
>> like (two file at a time) approach, but we have not explored that avenue
>> fully yet.
>> 
>> We’re eager to hear community feedback and will be ready to share patches
>> for review soon.
> ...
>> Patch for review.
> 
> Please send the patch to llvm-commits (and refer back to this thread).
> Patch review is too much traffic for llvm-dev.
> 
>> diff --git a/tools/mir-canon/CMakeLists.txt b/tools/mir-canon/CMakeLists.txt
>> new file mode 100644
>> index 00000000000..eb6722e4bd4
>> --- /dev/null
>> +++ b/tools/mir-canon/CMakeLists.txt
>> @@ -0,0 +1,28 @@
>> +set(LLVM_LINK_COMPONENTS
>> +  ${LLVM_TARGETS_TO_BUILD}
>> +  Analysis
>> +  AsmPrinter
>> +  CodeGen
>> +  Core
>> +  IRReader
>> +  MC
>> +  MIRParser
>> +  ScalarOpts
>> +  SelectionDAG
>> +  Support
>> +  Target
>> +  TransformUtils
>> +  Vectorize
>> +  )
>> +
>> +# Support plugins.
>> +set(LLVM_NO_DEAD_STRIP 1)
>> +
>> +add_llvm_tool(mir-canon
>> +  mir-canon.cpp
>> +  MIRCanonicalizerPass.cpp
>> +
>> +  DEPENDS
>> +  intrinsics_gen
>> +  )
>> +export_executable_symbols(mir-canon)
>> diff --git a/tools/mir-canon/Common.h b/tools/mir-canon/Common.h
>> new file mode 100644
>> index 00000000000..3e6be7ef643
>> --- /dev/null
>> +++ b/tools/mir-canon/Common.h
>> @@ -0,0 +1,21 @@
>> +#include <cassert>
>> +
>> +namespace llvm {
>> +/// The purpose of this pass is to establish a canonical operand naming so
>> +/// that code compiled with slightly different IR passes can be diffed more
>> +/// effectively than otherwise. This is done by renaming vregs in a given
>> +/// LiveRange in a canonical way.
>> +extern char &MIRCanonicalizerID;
>> +
>> +void initializeMIRCanonicalizerPass(PassRegistry &);
>> +
>> +} // namespace llvm
>> +
>> +#define CANON_CHECK(CHK, MSG) \
>> +  do { \
>> +    if ((CHK)) break; \
>> +    errs() << (std::string("") + MSG) << '\n'; \
>> +    assert(false && "CANON_CHECK FAILURE."); \
>> +    exit(EXIT_FAILURE); \
>> +  } while (false)
>> +
>> diff --git a/tools/mir-canon/LLVMBuild.txt b/tools/mir-canon/LLVMBuild.txt
>> new file mode 100644
>> index 00000000000..5d3e6bb5a91
>> --- /dev/null
>> +++ b/tools/mir-canon/LLVMBuild.txt
>> @@ -0,0 +1,22 @@
>> +;===- ./tools/mir-canon/LLVMBuild.txt --------------------------*- Conf -*--===;
>> +;
>> +;                     The LLVM Compiler Infrastructure
>> +;
>> +; This file is distributed under the University of Illinois Open Source
>> +; License. See LICENSE.TXT for details.
>> +;
>> +;===------------------------------------------------------------------------===;
>> +;
>> +; This is an LLVMBuild description file for the components in this subdirectory.
>> +;
>> +; For more information on the LLVMBuild system, please see:
>> +;
>> +;   http://llvm.org/docs/LLVMBuild.html
>> +;
>> +;===------------------------------------------------------------------------===;
>> +
>> +[component_0]
>> +type = Tool
>> +name = mir-canon
>> +parent = Tools
>> +required_libraries = AsmParser BitReader IRReader MIRParser TransformUtils Scalar Vectorize all-targets
>> diff --git a/tools/mir-canon/MIRCanonicalizerPass.cpp b/tools/mir-canon/MIRCanonicalizerPass.cpp
>> new file mode 100644
>> index 00000000000..a4a0a7a0e5f
>> --- /dev/null
>> +++ b/tools/mir-canon/MIRCanonicalizerPass.cpp
>> @@ -0,0 +1,588 @@
>> +//===-------------- MIRCanonicalizer.cpp - MIR Canonicalizer --------------===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +//===----------------------------------------------------------------------===//
>> +//
>> +// Reorders instructions canonically.
>> +// Renames virtual register operands canonically.
>> +// Strips certain MIR artifacts (optionally).
>> +//
>> +//===----------------------------------------------------------------------===//
>> +
>> +#include "llvm/CodeGen/Passes.h"
>> +#include "llvm/CodeGen/MachineInstrBuilder.h"
>> +#include "llvm/CodeGen/MachineFunctionPass.h"
>> +#include "llvm/CodeGen/MachineRegisterInfo.h"
>> +
>> +#include "llvm/Support/raw_ostream.h"
>> +#include "llvm/ADT/PostOrderIterator.h"
>> +#include "llvm/Target/TargetInstrInfo.h"
>> +
>> +#include <tuple>
>> +#include <queue>
>> +
>> +using namespace llvm;
>> +
>> +#include "Common.h"
>> +
>> +#define DEBUG_TYPE "mir-canonicalizer"
>> +
>> +static cl::opt<unsigned>
>> +CanonicalizeFunctionNumber("canon-nth-function", cl::Hidden, cl::init(~0u),
>> +                           cl::value_desc("N"),
>> +                           cl::desc("Function number to canonicalize."));
>> +
>> +static cl::opt<unsigned>
>> +CanonicalizeBasicBlockNumber("canon-nth-basicblock", cl::Hidden, cl::init(~0u),
>> +                             cl::value_desc("N"),
>> +                             cl::desc("BasicBlock number to canonicalize."));
>> +
>> +namespace {
>> +
>> +class MIRCanonicalizer : public MachineFunctionPass {
>> +public:
>> +  static char ID;
>> +  MIRCanonicalizer() : MachineFunctionPass(ID) {}
>> +
>> +  StringRef getPassName() const override {
>> +    return "Rename register operands in a canonical ordering.";
>> +  }
>> +
>> +  void getAnalysisUsage(AnalysisUsage &AU) const override {
>> +    AU.setPreservesCFG();
>> +    MachineFunctionPass::getAnalysisUsage(AU);
>> +  }
>> +
>> +  bool runOnMachineFunction(MachineFunction &MF) override;
>> +};
>> +
>> +} // end anonymous namespace
>> +
>> +enum RSType { RSE_Reg = 0, RSE_FrameIndex, RSE_NewCandidate };
>> +typedef std::tuple<RSType, unsigned> typedRegStackEntry;
>> +
>> +char MIRCanonicalizer::ID;
>> +
>> +char &llvm::MIRCanonicalizerID = MIRCanonicalizer::ID;
>> +
>> +INITIALIZE_PASS_BEGIN(MIRCanonicalizer, "mir-canonicalizer",
>> +                      "Rename Register Operands Canonically", false, false);
>> +
>> +INITIALIZE_PASS_END(MIRCanonicalizer, "mir-canonicalizer",
>> +                    "Rename Register Operands Canonically", false, false);
>> +
>> +static std::vector<MachineBasicBlock *> GetRPOList(MachineFunction &MF) {
>> +  MachineBasicBlock *Entry = &*MF.begin();
>> +  std::vector<MachineBasicBlock*> RPOList;
>> +  ReversePostOrderTraversal<MachineBasicBlock*> RPOT(Entry);
>> +  for (ReversePostOrderTraversal<MachineBasicBlock*>::rpo_iterator
>> +       MBBI = RPOT.begin(), MBBE = RPOT.end(); MBBI != MBBE; ++MBBI) {
>> +    MachineBasicBlock *MBB = *MBBI;
>> +    RPOList.push_back(MBB);
>> +  }
>> +
>> +  return RPOList;
>> +}
>> +
>> +// Set a dummy vreg. We use this vregs register class to generate throw-away
>> +// vregs that are used to skip vreg numbers so that vreg numbers line up.
>> +static unsigned GetDummyVReg(const MachineFunction &MF) {
>> +  for (auto &MBB : MF) {
>> +    for (auto II = MBB.begin(), IE = MBB.end(); II != IE; ++II) {
>> +      const MachineInstr *MI = &*II;
>> +      for (unsigned i = 0; i < MI->getNumOperands(); i++) {
>> +        const MachineOperand &MO = MI->getOperand(i);
>> +        if (!MO.isReg() || !TargetRegisterInfo::isVirtualRegister(MO.getReg()))
>> +          continue;
>> +        return MO.getReg();
>> +      }
>> +    }
>> +  }
>> +
>> +  return ~0U;
>> +}
>> +
>> +static bool rescheduleCanoncally(MachineBasicBlock *MBB) {
>> +
>> +  bool Changed = false;
>> +
>> +  // Calculates the distance of MI from the begining of its parent BB.
>> +  auto getInstrIdx = [](const MachineInstr &MI) {
>> +    unsigned i = 0;
>> +    for (auto &I : *MI.getParent()) {
>> +      if (&I == &MI)
>> +        return i;
>> +      i++;
>> +    }
>> +    return ~0U;
>> +  };
>> +
>> +  // Pre-Populate vector of instructions to reschedule so that we don't
>> +  // clobber the iterator.
>> +  std::vector<MachineInstr *> instructions;
>> +  for (auto II = MBB->begin(); II != MBB->end(); ++II) {
>> +    instructions.push_back(&*II);
>> +  }
>> +
>> +  for (auto II : instructions) {
>> +    if (II->getNumOperands() == 0)
>> +      continue;
>> +
>> +    MachineOperand &MO = II->getOperand(0);
>> +    if (!MO.isReg() || !TargetRegisterInfo::isVirtualRegister(MO.getReg()))
>> +      continue;
>> +
>> +    DEBUG(dbgs() << "Operand " << 0 << " of "; II->dump(); MO.dump(););
>> +
>> +    MachineInstr *def = II;
>> +    unsigned distance = ~0U;
>> +    MachineInstr *useToBringDefCloserTo = nullptr;
>> +    MachineRegisterInfo *MRI = &MBB->getParent()->getRegInfo();
>> +    for (auto UI = MRI->use_begin(MO.getReg()), UE = MRI->use_end();
>> +         UI != UE; ++UI) {
>> +      MachineInstr *useInst = UI->getParent();
>> +
>> +      const unsigned defLoc = getInstrIdx(*def);
>> +      const unsigned useLoc = getInstrIdx(*useInst);
>> +      const unsigned delta = (useLoc - defLoc);
>> +
>> +      if (useInst->getParent() != def->getParent())
>> +        continue;
>> +      if (defLoc >= useLoc)
>> +        continue;
>> +
>> +      if (delta < distance) {
>> +        distance = delta;
>> +        useToBringDefCloserTo = useInst;
>> +      }
>> +    }
>> +
>> +    MachineBasicBlock::iterator defI = MBB->instr_end();
>> +    MachineBasicBlock::iterator useI = MBB->instr_end();
>> +
>> +    for (auto BBI = MBB->instr_begin(); BBI != MBB->instr_end(); ++BBI) {
>> +
>> +      if (defI != MBB->instr_end() && useI != MBB->instr_end())
>> +        break;
>> +
>> +      if ((&*BBI != def) && (&*BBI != useToBringDefCloserTo))
>> +        continue;
>> +
>> +      if (&*BBI == def) {
>> +        defI = BBI;
>> +        continue;
>> +      }
>> +
>> +      if (&*BBI == useToBringDefCloserTo) {
>> +        useI = BBI;
>> +        continue;
>> +      }
>> +    }
>> +
>> +    if (defI == MBB->instr_end() || useI == MBB->instr_end())
>> +      continue;
>> +
>> +    DEBUG(dbgs() << "Splicing "; defI->dump(); dbgs() << " right before: ";
>> +          useI->dump(););
>> +
>> +    Changed = true;
>> +    MBB->splice(useI, MBB, defI);
>> +  }
>> +
>> +  return Changed;
>> +}
>> +
>> +static std::vector<MachineInstr *> populateCandidates(MachineBasicBlock *MBB) {
>> +  std::vector<MachineInstr *> candidates;
>> +  MachineRegisterInfo &MRI = MBB->getParent()->getRegInfo();
>> +
>> +  for (auto II = MBB->begin(), IE = MBB->end(); II != IE; ++II) {
>> +    MachineInstr *MI = &*II;
>> +
>> +    bool doesMISideEffect = false;
>> +
>> +    if (MI->getOperand(0).isReg()) {
>> +      const unsigned dst = MI->getOperand(0).getReg();
>> +      doesMISideEffect |= !TargetRegisterInfo::isVirtualRegister(dst);
>> +
>> +      for (auto UI = MRI.use_begin(dst); UI != MRI.use_end(); ++UI) {
>> +        if (doesMISideEffect) break;
>> +        doesMISideEffect |= (UI->getParent()->getParent() != MI->getParent());
>> +      }
>> +    }
>> +
>> +    if (!MI->mayStore() && !MI->isBranch() && !doesMISideEffect)
>> +      continue;
>> +
>> +    DEBUG(dbgs() << "Found Candidate:  "; MI->dump(););
>> +    candidates.push_back(MI);
>> +  }
>> +
>> +  return candidates;
>> +}
>> +
>> +void doCandidateWalk(std::vector<typedRegStackEntry> &VRegs,
>> +                     std::queue <typedRegStackEntry> &reg_queue,
>> +                     std::vector<MachineInstr *> &visitedMIs,
>> +                     const MachineBasicBlock *MBB) {
>> +
>> +  const MachineFunction &MF = *MBB->getParent();
>> +  const MachineRegisterInfo &MRI = MF.getRegInfo();
>> +
>> +  while (reg_queue.size()) {
>> +    auto regType = std::get<0>(reg_queue.front());
>> +    auto reg = std::get<1>(reg_queue.front());
>> +    reg_queue.pop();
>> +
>> +    if (regType == RSE_FrameIndex) {
>> +      DEBUG(dbgs() << "Popping frame index.\n";);
>> +      VRegs.push_back(std::make_tuple(RSE_FrameIndex, ~0U));
>> +      continue;
>> +    }
>> +
>> +    assert(regType == RSE_Reg && "Expected vreg or physreg.");
>> +
>> +    if (TargetRegisterInfo::isVirtualRegister(reg)) {
>> +      DEBUG(dbgs() << "Popping vreg ";
>> +          MRI.def_begin(reg)->dump();
>> +          dbgs() << "\n";);
>> +
>> +      bool hasVisitedVReg = false;
>> +      for (auto vreg : VRegs) {
>> +        if (std::get<1>(vreg) == reg) {
>> +          hasVisitedVReg = true;
>> +          break;
>> +        }
>> +      }
>> +
>> +      if (!hasVisitedVReg)
>> +        VRegs.push_back(std::make_tuple(RSE_Reg, reg));
>> +    } else {
>> +      DEBUG(dbgs() << "Popping physreg.\n";);
>> +      VRegs.push_back(std::make_tuple(RSE_Reg, reg));
>> +      continue;
>> +    }
>> +
>> +    for (auto RI = MRI.def_begin(reg), RE = MRI.def_end(); RI != RE; ++RI) {
>> +      MachineInstr *def = RI->getParent();
>> +
>> +      if (def->getParent() != MBB)
>> +        continue;
>> +
>> +      bool hasVisited = false;
>> +      for (auto VMI : visitedMIs) {
>> +        if (def == VMI) {
>> +          hasVisited = true;
>> +          break;
>> +        }
>> +      }
>> +
>> +      if (hasVisited)
>> +        break;
>> +
>> +      DEBUG(dbgs() << "\n========================\n";
>> +          dbgs() << "Visited MI: "; def->dump();
>> +          dbgs() << "BB Name: " << def->getParent()->getName() << "\n";
>> +          dbgs() << "\n========================\n";);
>> +      visitedMIs.push_back(def);
>> +      for (unsigned I = 1, E = def->getNumOperands(); I != E; ++I) {
>> +
>> +        MachineOperand &MO = def->getOperand(I);
>> +        if (MO.isFI()) {
>> +          DEBUG(dbgs() << "Pushing frame index.\n";);
>> +          reg_queue.push(std::make_tuple(RSE_FrameIndex, ~0U));
>> +        }
>> +
>> +        if (!MO.isReg())
>> +          continue;
>> +        reg_queue.push(std::make_tuple(RSE_Reg, MO.getReg()));
>> +      }
>> +    }
>> +  }
>> +}
>> +
>> +static void SkipVRegs(unsigned &VRegGapIndex, MachineRegisterInfo &MRI,
>> +                      const TargetRegisterClass *RC) {
>> +  const unsigned VR_GAP = (++VRegGapIndex * 1000);
>> +
>> +  DEBUG(dbgs() << "Adjusting per-BB VR_GAP for BB" << VRegGapIndex << " to "
>> +               << VR_GAP << "\n";);
>> +
>> +  unsigned I = MRI.createVirtualRegister(RC);
>> +  const unsigned E = (((I + VR_GAP) / VR_GAP) + 1) * VR_GAP;
>> +  while (I != E) {
>> +    I = MRI.createVirtualRegister(RC);
>> +  }
>> +
>> +  CANON_CHECK(I != ~0U && I != 0, "Invalid VReg");
>> +}
>> +
>> +static void GetVRegRenameMap(std::map<unsigned, unsigned> &vregRenameMap,
>> +                             const std::vector<typedRegStackEntry> &VRegs,
>> +                             const std::vector<unsigned> &renamedInOtherBB,
>> +                             MachineRegisterInfo &MRI,
>> +                             const TargetRegisterClass *RC) {
>> +
>> +  unsigned lastRenameReg = MRI.createVirtualRegister(RC);
>> +
>> +  bool firstCandidate = true;
>> +  for (auto vreg : VRegs) {
>> +
>> +    auto regType = std::get<0>(vreg);
>> +    auto reg = std::get<1>(vreg);
>> +
>> +    if (regType == RSE_FrameIndex) {
>> +      lastRenameReg = MRI.createVirtualRegister(RC);
>> +      DEBUG(dbgs() << "Skipping rename for FI " << lastRenameReg << "\n";);
>> +      continue;
>> +    } else if (regType == RSE_NewCandidate) {
>> +      if (!firstCandidate) {
>> +        while (lastRenameReg % 10) {
>> +          lastRenameReg = MRI.createVirtualRegister(RC);
>> +
>> +          DEBUG(dbgs() << "Skipping rename for new candidate " << lastRenameReg
>> +                       << "\n";);
>> +        }
>> +      }
>> +      firstCandidate = false;
>> +      continue;
>> +    } else if (!TargetRegisterInfo::isVirtualRegister(reg)) {
>> +      lastRenameReg = MRI.createVirtualRegister(RC);
>> +      DEBUG(dbgs() << "Skipping rename for Phys Reg " << lastRenameReg << "\n";);
>> +      continue;
>> +    }
>> +
>> +    if (std::find(renamedInOtherBB.begin(), renamedInOtherBB.end(), reg) !=
>> +        renamedInOtherBB.end()) {
>> +      DEBUG(dbgs() << "Vreg " << reg << " already renamed in other BB.\n";);
>> +      continue;
>> +    }
>> +
>> +    auto rename = MRI.createVirtualRegister(MRI.getRegClass(reg));
>> +    lastRenameReg = rename;
>> +
>> +    if (vregRenameMap.find(reg) == vregRenameMap.end()) {
>> +      DEBUG(dbgs() << "Mapping vreg ";);
>> +      if (MRI.reg_begin(reg) != MRI.reg_end()) {
>> +        DEBUG(auto foo = &*MRI.reg_begin(reg); foo->dump(););
>> +      } else {
>> +        DEBUG(dbgs() << reg;);
>> +      }
>> +      DEBUG(dbgs() << " to ";);
>> +      if (MRI.reg_begin(rename) != MRI.reg_end()) {
>> +        DEBUG(auto foo = &*MRI.reg_begin(rename); foo->dump(););
>> +      } else {
>> +        DEBUG(dbgs() << rename;);
>> +      }
>> +      DEBUG(dbgs() << "\n";);
>> +
>> +      vregRenameMap.insert(std::pair<unsigned, unsigned>(reg, rename));
>> +    }
>> +  }
>> +}
>> +
>> +static bool doVRegRenaming(std::vector<unsigned> &renamedInOtherBB,
>> +                           const std::map<unsigned, unsigned> &vregRenameMap,
>> +                           MachineRegisterInfo &MRI) {
>> +  bool Changed = false;
>> +  for (auto I = vregRenameMap.begin(), E = vregRenameMap.end(); I != E; ++I) {
>> +
>> +    auto vreg = I->first;
>> +    auto rename = I->second;
>> +
>> +    renamedInOtherBB.push_back(rename);
>> +
>> +    std::vector<MachineOperand *> renameMOs;
>> +    for (MachineOperand &MO : MRI.reg_operands(vreg)) {
>> +      renameMOs.push_back(&MO);
>> +    }
>> +
>> +    for (auto MO : renameMOs) {
>> +      Changed = true;
>> +      MO->setReg(rename);
>> +
>> +      if (!MO->isDef())
>> +        MO->setIsKill(false);
>> +    }
>> +  }
>> +
>> +  return Changed;
>> +}
>> +
>> +static bool doDefKillClear(MachineBasicBlock *MBB) {
>> +  bool Changed = false;
>> +
>> +  for (auto II = MBB->begin(), IE = MBB->end(); II != IE; ++II) {
>> +    MachineInstr *MI = &*II;
>> +
>> +    for (unsigned i = 0; i < MI->getNumOperands(); i++) {
>> +      MachineOperand &MO = MI->getOperand(i);
>> +      if (!MO.isReg())
>> +        continue;
>> +      if (!MO.isDef() && MO.isKill()) {
>> +        Changed = true;
>> +        MO.setIsKill(false);
>> +      }
>> +
>> +      if (MO.isDef() && MO.isDead()) {
>> +        Changed = true;
>> +        MO.setIsDead(false);
>> +      }
>> +    }
>> +  }
>> +
>> +  return Changed;
>> +}
>> +
>> +static bool runOnBasicBlock(MachineBasicBlock *MBB,
>> +                            std::vector<StringRef> &bbNames,
>> +                            std::vector<unsigned> &renamedInOtherBB,
>> +                            unsigned &basicBlockNum, unsigned &VRegGapIndex) {
>> +
>> +  if (CanonicalizeBasicBlockNumber != ~0U) {
>> +    if (CanonicalizeBasicBlockNumber != basicBlockNum++)
>> +      return false;
>> +    DEBUG(dbgs() << "\n Canonicalizing BasicBlock " << MBB->getName() << "\n";);
>> +  }
>> +
>> +  if (std::find(bbNames.begin(), bbNames.end(), MBB->getName()) !=
>> +      bbNames.end()) {
>> +    DEBUG(dbgs() << "Found potentially duplicate BasicBlocks: "
>> +                 << MBB->getName() << "\n";);
>> +    return false;
>> +  }
>> +
>> +  DEBUG(
>> +    dbgs() << "\n\n  NEW BASIC BLOCK: " << MBB->getName() << "  \n\n";
>> +    dbgs() << "\n\n================================================\n\n";
>> +  );
>> +
>> +  bool Changed = false;
>> +  MachineFunction &MF = *MBB->getParent();
>> +  MachineRegisterInfo &MRI = MF.getRegInfo();
>> +
>> +  const TargetRegisterClass *dummyRC = [] (const MachineFunction &MF) {
>> +    const unsigned dummyVReg = GetDummyVReg(MF);
>> +    CANON_CHECK(dummyVReg != ~0U, "Could not find a dummy VReg.");
>> +    const MachineRegisterInfo &MRI = MF.getRegInfo();
>> +    return MRI.getRegClass(dummyVReg);
>> +  } (MF);
>> +
>> +  bbNames.push_back(MBB->getName());
>> +  DEBUG(dbgs() << "\n\n NEW BASIC BLOCK: " << MBB->getName() << "\n\n";);
>> +
>> +  DEBUG(dbgs() << "MBB Before Scheduling:\n"; MBB->dump(););
>> +  Changed |= rescheduleCanoncally(MBB);
>> +  DEBUG(dbgs() << "MBB After Scheduling:\n"; MBB->dump(););
>> +
>> +  std::vector<MachineInstr *> candidates = populateCandidates(MBB);
>> +  std::vector<MachineInstr *> visitedMIs;
>> +  std::copy(candidates.begin(), candidates.end(),
>> +            std::back_inserter(visitedMIs));
>> +
>> +  std::vector<typedRegStackEntry> VRegs;
>> +  for (auto candidate : candidates) {
>> +    VRegs.push_back(std::make_tuple(RSE_NewCandidate, ~0U));
>> +
>> +    std::queue<typedRegStackEntry> reg_queue;
>> +
>> +    // Here we walk the vreg operands of a non-root node along our walk.
>> +    // The root nodes are the original candidates (stores normally).
>> +    // These are normally not the root nodes (except for the case of copies to
>> +    // physical registers).
>> +    for (unsigned i = 1; i < candidate->getNumOperands(); i++) {
>> +      if (candidate->mayStore() || candidate->isBranch())
>> +        break;
>> +
>> +      MachineOperand &MO = candidate->getOperand(i);
>> +      if (!(MO.isReg() && TargetRegisterInfo::isVirtualRegister(MO.getReg())))
>> +        continue;
>> +
>> +      DEBUG(dbgs() << "Enqueue register"; MO.dump(); dbgs() << "\n";);
>> +      reg_queue.push(std::make_tuple(RSE_Reg, MO.getReg()));
>> +    }
>> +
>> +    // Here we walk the root candidates. We start from the 0th operand because
>> +    // the root is normally a store to a vreg.
>> +    for (unsigned i = 0; i < candidate->getNumOperands(); i++) {
>> +
>> +      if (!candidate->mayStore() && !candidate->isBranch())
>> +        break;
>> +
>> +      MachineOperand &MO = candidate->getOperand(i);
>> +
>> +      // TODO: Do we want to only add vregs here?
>> +      if (!MO.isReg() && !MO.isFI())
>> +        continue;
>> +
>> +      DEBUG(dbgs() << "Enqueue Reg/FI"; MO.dump(); dbgs() << "\n";);
>> +
>> +      reg_queue.push(MO.isReg() ? std::make_tuple(RSE_Reg, MO.getReg()) :
>> +                                  std::make_tuple(RSE_FrameIndex,  ~0U));
>> +    }
>> +
>> +    // Now that we have pushed the operands of the candidate here, we do the
>> +    // full breadth first walk.
>> +    doCandidateWalk(VRegs, reg_queue, visitedMIs, MBB);
>> +  }
>> +
>> +  // If we have populated no vregs to rename then bail.
>> +  // The rest of this function does the vreg remaping.
>> +  if (VRegs.size() == 0)
>> +    return Changed;
>> +
>> +  // Skip some vregs, so we can recon where we'll land next.
>> +  SkipVRegs(VRegGapIndex, MRI, dummyRC);
>> +
>> +  std::map<unsigned, unsigned> vregRenameMap;
>> +  GetVRegRenameMap(vregRenameMap, VRegs, renamedInOtherBB, MRI, dummyRC);
>> +
>> +  Changed |= doVRegRenaming(renamedInOtherBB, vregRenameMap, MRI);
>> +  Changed |= doDefKillClear(MBB);
>> +
>> +  DEBUG(dbgs() << "Updated MachineBasicBlock:\n"; MBB->dump(); dbgs() << "\n";);
>> +  DEBUG(dbgs() << "\n\n================================================\n\n");
>> +  return Changed;
>> +}
>> +
>> +bool MIRCanonicalizer::runOnMachineFunction(MachineFunction &MF) {
>> +
>> +  static unsigned functionNum = 0;
>> +  if (CanonicalizeFunctionNumber != ~0U) {
>> +    if (CanonicalizeFunctionNumber != functionNum++)
>> +      return false;
>> +    DEBUG(dbgs() << "\n Canonicalizing Function " << MF.getName() << "\n";);
>> +  }
>> +
>> +  // we need a valid vreg to create a vreg type for skipping all those
>> +  // stray vreg numbers so reach alignment/canonical vreg values.
>> +  std::vector<MachineBasicBlock*> RPOList = GetRPOList(MF);
>> +
>> +  DEBUG(
>> +    dbgs() << "\n\n  NEW MACHINE FUNCTION: " << MF.getName() << "  \n\n";
>> +    dbgs() << "\n\n================================================\n\n";
>> +    dbgs() << "Total Basic Blocks: " << RPOList.size() << "\n";
>> +    for (auto MBB : RPOList) {
>> +      dbgs() << MBB->getName() << "\n";
>> +    }
>> +    dbgs() << "\n\n================================================\n\n";
>> +  );
>> +
>> +  std::vector<StringRef> bbNames;
>> +  std::vector<unsigned> renamedInOtherBB;
>> +
>> +  unsigned gapIdx = 0;
>> +  unsigned bbNum = 0;
>> +
>> +  bool Changed = false;
>> +
>> +  for (auto MBB : RPOList)
>> +    Changed |= runOnBasicBlock(MBB, bbNames, renamedInOtherBB, bbNum, gapIdx);
>> +
>> +  return Changed;
>> +}
>> +
>> diff --git a/tools/mir-canon/mir-canon.cpp b/tools/mir-canon/mir-canon.cpp
>> new file mode 100644
>> index 00000000000..bfe53c01906
>> --- /dev/null
>> +++ b/tools/mir-canon/mir-canon.cpp
>> @@ -0,0 +1,175 @@
>> +//===------ mir-canon.cpp - Implement the LLVM Native Code Generator ------===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +//===----------------------------------------------------------------------===//
>> +//
>> +// This is the mir-canon driver. It invokes the MIR canonicalization pass.
>> +//
>> +//===----------------------------------------------------------------------===//
>> +
>> +#include "llvm/CodeGen/CommandFlags.h"
>> +#include "llvm/CodeGen/TargetPassConfig.h"
>> +#include "llvm/CodeGen/MachineModuleInfo.h"
>> +#include "llvm/CodeGen/MachineFunctionPass.h"
>> +#include "llvm/CodeGen/MIRParser/MIRParser.h"
>> +
>> +#include "llvm/IR/LLVMContext.h"
>> +#include "llvm/IR/DiagnosticInfo.h"
>> +#include "llvm/IR/IRPrintingPasses.h"
>> +#include "llvm/IR/DiagnosticPrinter.h"
>> +#include "llvm/IR/LegacyPassManager.h"
>> +
>> +#include "llvm/Support/Signals.h"
>> +#include "llvm/Support/FileSystem.h"
>> +#include "llvm/Support/TargetSelect.h"
>> +#include "llvm/Support/TargetRegistry.h"
>> +#include "llvm/Support/ToolOutputFile.h"
>> +#include "llvm/Support/FormattedStream.h"
>> +#include "llvm/Support/PrettyStackTrace.h"
>> +
>> +#include "llvm/Target/TargetMachine.h"
>> +#include "llvm/Target/TargetSubtargetInfo.h"
>> +
>> +#include "Common.h"
>> +
>> +using namespace llvm;
>> +
>> +static cl::opt<std::string>
>> +    InputFilename(cl::Positional, cl::desc("<input.mir>"), cl::init("-"));
>> +
>> +static cl::opt<std::string> OutputFilename("o", cl::desc("Output filename"),
>> +                                           cl::value_desc("canon.mir"));
>> +
>> +static cl::opt<std::string> TargetTriple("mtriple",
>> +                                         cl::desc("Override target triple."));
>> +
>> +int main(int argc, char **argv) {
>> +  sys::PrintStackTraceOnErrorSignal(argv[0]);
>> +  PrettyStackTraceProgram X(argc, argv);
>> +
>> +  EnableDebugBuffering = true;
>> +
>> +  LLVMContext Context;
>> +  llvm_shutdown_obj Y;
>> +
>> +  // Initialize targets first, so that --version shows registered targets.
>> +  InitializeAllTargets();
>> +  InitializeAllTargetMCs();
>> +  InitializeAllAsmPrinters();
>> +  InitializeAllAsmParsers();
>> +
>> +  PassRegistry *Registry = PassRegistry::getPassRegistry();
>> +  initializeCore(*Registry);
>> +  initializeCodeGen(*Registry);
>> +
>> +  initializeMIRCanonicalizerPass(*Registry);
>> +
>> +  // Register the target printer for --version.
>> +  cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);
>> +
>> +  cl::ParseCommandLineOptions(argc, argv, "llvm system compiler\n");
>> +
>> +  CANON_CHECK(StringRef(InputFilename).endswith_lower(".mir"),
>> +              "Invalid Filename: " + InputFilename + ". " +
>> +              "Expected a .mir file extension.");
>> +
>> +  // Set a diagnostic handler that doesn't exit on the first error
>> +  bool HasError = false;
>> +  Context.setDiagnosticHandler(
>> +    [](const DiagnosticInfo &DI, void *Context) {
>> +      if (DI.getSeverity() == DS_Error) {
>> +        bool *HasError = static_cast<bool *>(Context);
>> +        *HasError = true;
>> +      }
>> +
>> +      auto *Remark = dyn_cast<DiagnosticInfoOptimizationBase>(&DI);
>> +      if (Remark && !Remark->isEnabled())
>> +        return;
>> +
>> +      DiagnosticPrinterRawOStream DP(errs());
>> +      errs() << LLVMContext::getDiagnosticMessagePrefix(DI.getSeverity());
>> +      errs() << ": ";
>> +      DI.print(DP);
>> +      errs() << "\n";
>> +    },
>> +    &HasError);
>> +
>> +  SMDiagnostic Err;
>> +  auto MIR = createMIRParserFromFile(InputFilename, Err, Context);
>> +  auto M = MIR ? MIR->parseIRModule() : nullptr;
>> +
>> +  CANON_CHECK(M, argv[0] + "Invalid Module or MIR.");
>> +
>> +  if (!TargetTriple.empty())
>> +    M->setTargetTriple(Triple::normalize(TargetTriple));
>> +
>> +  Triple TheTriple = Triple(M->getTargetTriple());
>> +  if (TheTriple.getTriple().empty())
>> +    TheTriple.setTriple(sys::getDefaultTargetTriple());
>> +
>> +  // Get the target specific parser.
>> +  std::string Error;
>> +  const Target *TheTarget =
>> +      TargetRegistry::lookupTarget(MArch, TheTriple, Error);
>> +
>> +  CANON_CHECK(TheTarget, argv[0] + ": " + Error);
>> +
>> +  std::string CPUStr = getCPUStr();
>> +  std::string FeaturesStr = getFeaturesStr();
>> +
>> +  std::unique_ptr<TargetMachine> Target(TheTarget->createTargetMachine(
>> +      TheTriple.getTriple(), CPUStr, FeaturesStr,
>> +      InitTargetOptionsFromCodeGenFlags(), getRelocModel(), getCodeModel(),
>> +      CodeGenOpt::Default));
>> +
>> +  CANON_CHECK(Target, "Could not allocate target machine!");
>> +
>> +  std::error_code EC;
>> +  auto Out = llvm::make_unique<tool_output_file>(OutputFilename, EC,
>> +                                                 sys::fs::F_Text);
>> +  CANON_CHECK(!EC, EC.message());
>> +  raw_pwrite_stream *OS = &Out->os();
>> +
>> +  legacy::PassManager PM;
>> +  TargetLibraryInfoImpl TLII(Triple(M->getTargetTriple()));
>> +  PM.add(new TargetLibraryInfoWrapperPass(TLII));
>> +  M->setDataLayout(Target->createDataLayout());
>> +  setFunctionAttributes(CPUStr, FeaturesStr, *M);
>> +
>> +  LLVMTargetMachine &TM = static_cast<LLVMTargetMachine &>(*Target);
>> +  TargetPassConfig &TPC = *TM.createPassConfig(PM);
>> +  PM.add(&TPC);
>> +  MachineModuleInfo *MMI = new MachineModuleInfo(&TM);
>> +  PM.add(MMI);
>> +  TPC.printAndVerify("");
>> +
>> +  StringRef PassName = "mir-canonicalizer";
>> +  const PassRegistry *PR = PassRegistry::getPassRegistry();
>> +  const PassInfo *PI = PR->getPassInfo(PassName);
>> +
>> +  CANON_CHECK(PI, argv[0] + ": " + PI->getPassName() + " not registered.");
>> +  CANON_CHECK(PI->getNormalCtor(),
>> +              argv[0] + ": can't create pass: " + PI->getPassName());
>> +
>> +  Pass *P = PI->getNormalCtor()();
>> +  PM.add(P);
>> +  TPC.printAndVerify(std::string("After ") + std::string(P->getPassName()));
>> +
>> +  PM.add(createPrintMIRPass(*OS));
>> +
>> +  // Before executing passes, print the final values of the LLVM options.
>> +  cl::PrintOptionValues();
>> +
>> +  PM.run(*M);
>> +
>> +  CANON_CHECK(!(*static_cast<bool *>(Context.getDiagnosticContext())),
>> +              argv[0] + " Error: exiting...");
>> +
>> +  Out->keep();
>> +  return 0;
>> +}
>> +
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev