[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Tue Aug 22 14:09:05 PDT 2017

Patch for review.

On Mon, Aug 21, 2017 at 11:45 PM Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
wrote:

> Ping.
>
> Still working on preparing code for review. Will have a patch for review
> ready in the coming days.
>
> PL
>
> On Tue, Aug 15, 2017 at 12:06 PM Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
> wrote:
>
>> Hi,
>>
>>
>> My name is Puyan and I've been exploring ways to improve the state of
>> instruction level diffing using llvm and MIR. Below is a proposal for a new
>> llvm tool meant to address issues encountered when diffing at the machine
>> level. I'm eager to hear the community's feedback.
>>
>>
>> Thanks
>>
>>
>> PL
>>
>>
>> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
>>
>>
>> Problem Statement and Context:
>>
>> Diff-tools are regularly used for comparing IR and assembly files. This
>> often involves reasoning through differences that are semantically
>> equivalent and it can be very time consuming for a person to do said
>> reasoning.
>>
>> Specifically in the context of GlobalISel development there are problems
>> of correctness verification. There is a need to compare two programs,
>> compiled from identical IR by two different instruction selectors in a way
>> where the true differences stand out.
>>
>>
>> Proposal:
>>
>> We propose a new tool that we have tentatively named 'mir-canon' that
>> performs canonical transformations on MIR. The goal is for MIR
>> pre-processed with mir-canon to show fewer differences than if it were not
>> pre-processed.
>>
>> At the time of this writing we have a prototype canonicalization tool.
>> We’ve come up with some techniques that show promise and would like to open
>> discussion with the community to get feedback and suggestions on refining
>> said techniques. Currently we think of this as an open ended project.
>>
>>
>> Techniques:
>>
>> Our prototype does the following for each basic block in a Reverse Post
>> Ordering:
>>
>>
>>    -
>>
>>    Canonical instruction ordering is done by moving a given instruction
>>    as close to the nearest use of its def as possible.
>>
>>
>>
>>    -
>>
>>    Next, canonical VReg renaming is done by building a collection of
>>    candidate instructions that can be thought of as sinks in the def-use
>>    graph: they are typically instructions that write to physical registers or
>>    store to memory. These candidates are used as the root of a breadth first
>>    walk over the vreg operand def-use graph that determines a canonical vreg
>>    ordering.
>>
>>
>>
>>    -
>>
>>    Using said canonical vreg ordering we rename monotonically, but
>>    before we do this we skip several vreg values in order to increase the
>>    chance that we land on the same vreg number for two different input MIR
>>    files. We also do this to reduce the chances that a difference in
>>    previously def-use walks will affect the vreg renaming for subsequent
>>    walks. This skipping step could be thought of as a kind of vreg number
>>    reckoning: we skip modulo n vregs so that we are likely to land on the same
>>    vreg for two different files.
>>
>>
>>
>> This approach is completely agnostic of ISA specific semantics so it
>> should work for any target.
>>
>>
>> Current status:
>>
>> At the moment we have a standalone llvm tool that uses a single pass to
>> do the above described transformations. We have test inputs that show
>> promise but we still need a wider set of tests as well as hard metrics.
>>
>> Our approach processes a single file at a time. The primary benefit to
>> this approach is lower complexity in initial implementation and exploration
>> of building such a tool. We are open to other approaches such as an
>> llvm-diff like (two file at a time) approach, but we have not explored that
>> avenue fully yet.
>>
>> We’re eager to hear community feedback and will be ready to share patches
>> for review soon.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170822/680e16a1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mir-canon.patch
Type: application/octet-stream
Size: 27540 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170822/680e16a1/attachment.obj>