[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
Puyan Lotfi via llvm-dev
llvm-dev at lists.llvm.org
Tue Aug 15 12:06:02 PDT 2017
My name is Puyan and I've been exploring ways to improve the state of
instruction level diffing using llvm and MIR. Below is a proposal for a new
llvm tool meant to address issues encountered when diffing at the machine
level. I'm eager to hear the community's feedback.
mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
Problem Statement and Context:
Diff-tools are regularly used for comparing IR and assembly files. This
often involves reasoning through differences that are semantically
equivalent and it can be very time consuming for a person to do said
Specifically in the context of GlobalISel development there are problems of
correctness verification. There is a need to compare two programs, compiled
from identical IR by two different instruction selectors in a way where the
true differences stand out.
We propose a new tool that we have tentatively named 'mir-canon' that
performs canonical transformations on MIR. The goal is for MIR
pre-processed with mir-canon to show fewer differences than if it were not
At the time of this writing we have a prototype canonicalization tool.
We’ve come up with some techniques that show promise and would like to open
discussion with the community to get feedback and suggestions on refining
said techniques. Currently we think of this as an open ended project.
Our prototype does the following for each basic block in a Reverse Post
Canonical instruction ordering is done by moving a given instruction as
close to the nearest use of its def as possible.
Next, canonical VReg renaming is done by building a collection of
candidate instructions that can be thought of as sinks in the def-use
graph: they are typically instructions that write to physical registers or
store to memory. These candidates are used as the root of a breadth first
walk over the vreg operand def-use graph that determines a canonical vreg
Using said canonical vreg ordering we rename monotonically, but before
we do this we skip several vreg values in order to increase the chance that
we land on the same vreg number for two different input MIR files. We also
do this to reduce the chances that a difference in previously def-use walks
will affect the vreg renaming for subsequent walks. This skipping step
could be thought of as a kind of vreg number reckoning: we skip modulo n
vregs so that we are likely to land on the same vreg for two different
This approach is completely agnostic of ISA specific semantics so it should
work for any target.
At the moment we have a standalone llvm tool that uses a single pass to do
the above described transformations. We have test inputs that show promise
but we still need a wider set of tests as well as hard metrics.
Our approach processes a single file at a time. The primary benefit to this
approach is lower complexity in initial implementation and exploration of
building such a tool. We are open to other approaches such as an llvm-diff
like (two file at a time) approach, but we have not explored that avenue
We’re eager to hear community feedback and will be ready to share patches
for review soon.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev