[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Thu Sep 7 11:21:54 PDT 2017

On Wed, Sep 6, 2017 at 3:36 PM, Matthias Braun <mbraun at apple.com> wrote:

> This would be easier to discuss on llvm-commits and phabricator.
>
> On Aug 22, 2017, at 2:09 PM, Puyan Lotfi via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> Patch for review.
>
>
>
>
> On Mon, Aug 21, 2017 at 11:45 PM Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
> wrote:
>
>> Ping.
>>
>> Still working on preparing code for review. Will have a patch for review
>> ready in the coming days.
>>
>> PL
>>
>> On Tue, Aug 15, 2017 at 12:06 PM Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> My name is Puyan and I've been exploring ways to improve the state of
>>> instruction level diffing using llvm and MIR. Below is a proposal for a new
>>> llvm tool meant to address issues encountered when diffing at the machine
>>> level. I'm eager to hear the community's feedback.
>>>
>>> Thanks
>>>
>>> PL
>>>
>>> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
>>>
>>>
>>> Problem Statement and Context:
>>>
>>> Diff-tools are regularly used for comparing IR and assembly files. This
>>> often involves reasoning through differences that are semantically
>>> equivalent and it can be very time consuming for a person to do said
>>> reasoning.
>>>
>>> Specifically in the context of GlobalISel development there are problems
>>> of correctness verification. There is a need to compare two programs,
>>> compiled from identical IR by two different instruction selectors in a way
>>> where the true differences stand out.
>>>
>> Sounds interesting!
> I hope some people involved in GlobalISel will chime in the discussion
> about the patch.
>

I'm hoping to get them using it once things get checked in and are
available in all our branches.

>
>>>
>>> Proposal:
>>>
>>> We propose a new tool that we have tentatively named 'mir-canon' that
>>> performs canonical transformations on MIR. The goal is for MIR
>>> pre-processed with mir-canon to show fewer differences than if it were not
>>> pre-processed.
>>>
>>> At the time of this writing we have a prototype canonicalization tool.
>>> We’ve come up with some techniques that show promise and would like to open
>>> discussion with the community to get feedback and suggestions on refining
>>> said techniques. Currently we think of this as an open ended project.
>>>
>>>
>>> Techniques:
>>>
>>> Our prototype does the following for each basic block in a Reverse Post
>>> Ordering:
>>>
>>>
>>>    - Canonical instruction ordering is done by moving a given
>>>    instruction as close to the nearest use of its def as possible.
>>>
>>> From glancing at the patch this only seems to respect vreg dependencies
> but I couldn't see memory / side effect tracking (ScheduleDAGInstrs may be
> the right tool for the job).
>
>
I've currently not gotten to that. At the moment, I treat memory and
physical register side effects as canonicalization barriers. Doing a side
effect tracker sounds like a good next step.

>
>>>
>>>    - Next, canonical VReg renaming is done by building a collection of
>>>    candidate instructions that can be thought of as sinks in the def-use
>>>    graph: they are typically instructions that write to physical registers or
>>>    store to memory. These candidates are used as the root of a breadth first
>>>    walk over the vreg operand def-use graph that determines a canonical vreg
>>>    ordering.
>>>
>>>
>>>
>>>    - Using said canonical vreg ordering we rename monotonically, but
>>>    before we do this we skip several vreg values in order to increase the
>>>    chance that we land on the same vreg number for two different input MIR
>>>    files. We also do this to reduce the chances that a difference in
>>>    previously def-use walks will affect the vreg renaming for subsequent
>>>    walks. This skipping step could be thought of as a kind of vreg number
>>>    reckoning: we skip modulo n vregs so that we are likely to land on the same
>>>    vreg for two different files.
>>>
>>> It may be interesting to extend the .mir file format here to allow
> arbitrary names for vregs.
>
> So instead of
>
> block1:
> %0 = foo
> %1 = bar
>    use %1
>
> block2:
> %100 = baz
>
> you could do something along the lines of:
>
> block1:
>   %1_foo0 = foo
>   %1_bar1 = bar
>          use %1_bar1
>
> block2:
>   %2_baz = baz
>
> That way you could avoid the vreg numbering games and by encoding the
> instruction type you are even robust against insertion/removal of
> instructions of different types.
>
> I'd certainly be open to some mir extension patches.
>

That would actually be pretty great because currently the compile time
suffers a lot from generating all those skipped vregs.
I will certainly follow up with you as a second phase of this in order to
speed things up.

I will send a revised patch to the commits list as soon as I can.

PL

>
> - Matthias
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170907/3e0292d5/attachment-0001.html>