<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 6, 2017 at 3:36 PM, Matthias Braun <span dir="ltr"><<a href="mailto:mbraun@apple.com" target="_blank">mbraun@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>This would be easier to discuss on llvm-commits and phabricator.</div><div><br></div><div><span class=""><blockquote type="cite"><div>On Aug 22, 2017, at 2:09 PM, Puyan Lotfi via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="m_6926327527829593398Apple-interchange-newline"><div><div dir="ltr"><div><br></div>Patch for review.<div><br></div><div><br><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Mon, Aug 21, 2017 at 11:45 PM Puyan Lotfi <<a href="mailto:puyan.lotfi.llvm@gmail.com" target="_blank">puyan.lotfi.llvm@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Ping.<div><br></div><div>Still working on preparing code for review. Will have a patch for review ready in the coming days. </div></div><div dir="ltr"><div><br></div><div>PL</div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Aug 15, 2017 at 12:06 PM Puyan Lotfi <<a href="mailto:puyan.lotfi.llvm@gmail.com" target="_blank">puyan.lotfi.llvm@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><span id="m_6926327527829593398m_2148333917850378115m_2452468905233419516gmail-docs-internal-guid-44256f1b-e740-950e-75a8-4282e27328f5"><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14pt;font-family:Arial;background-color:transparent;font-weight:700;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><span style="font-size:14.666666984558105px;font-weight:normal">Hi,</span></span></div><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14pt;font-family:Arial;background-color:transparent;font-weight:700;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><span style="font-size:14.666666984558105px;font-weight:normal"><br></span></span></div><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14pt;font-family:Arial;background-color:transparent;font-weight:700;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><span style="font-size:14.666666984558105px;font-weight:normal">My name is Puyan and </span></span><span style="font-family:Arial;font-size:14.666666984558105px;white-space:pre-wrap;background-color:transparent">I've been exploring ways to improve the state of instruction level diffing using llvm and MIR. Below is a proposal for a new llvm tool meant to address issues encountered when diffing at the machine level. I'm eager to hear the community's feedback.</span></div><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;font-size:14.666666984558105px;white-space:pre-wrap;background-color:transparent"><br></span></div><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;font-size:14.666666984558105px;white-space:pre-wrap;background-color:transparent">Thanks</span></div><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;font-size:14.666666984558105px;white-space:pre-wrap;background-color:transparent"><br></span></div><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;font-size:14.666666984558105px;white-space:pre-wrap;background-color:transparent">PL</span></div><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14pt;font-family:Arial;background-color:transparent;font-weight:700;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><br></span></div><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:14pt;font-family:Arial;background-color:transparent;font-weight:700;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">mir-canon: A new tool for canonicalizing MIR for cleaner diffing.</span></div><br><br><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:700;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Problem Statement and Context:</span></div><br><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Diff-tools are regularly used for comparing IR and assembly files. This often involves reasoning through differences that are semantically equivalent and it can be very time consuming for a person to do said reasoning.</span></div><br><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Specifically in the context of GlobalISel development there are problems of correctness verification. There is a need to compare two programs, compiled from identical IR by two different instruction selectors in a way where the true differences stand out.</span></div></span></div></blockquote></div></blockquote></div></div></div></blockquote></span><div>Sounds interesting!</div><div>I hope some people involved in GlobalISel will chime in the discussion about the patch.</div></div></div></blockquote><div><br></div><div>I'm hoping to get them using it once things get checked in and are available in all our branches.  <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><span class=""><blockquote type="cite"><div><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><span id="m_6926327527829593398m_2148333917850378115m_2452468905233419516gmail-docs-internal-guid-44256f1b-e740-950e-75a8-4282e27328f5"><br><br><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:700;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Proposal:</span></div><br><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">We propose a new tool that we have tentatively named 'mir-canon' that performs canonical transformations on MIR. The goal is for MIR pre-processed with mir-canon to show fewer differences than if it were not pre-processed.</span></div><br><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">At the time of this writing we have a prototype canonicalization tool. We’ve come up with some techniques that show promise and would like to open discussion with the community to get feedback and suggestions on refining said techniques. Currently we think of this as an open ended project.</span></div><br><br><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:700;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Techniques:</span></div><br><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Our prototype does the following for each basic block in a Reverse Post Ordering:</span></div><br><ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;background-color:transparent;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline"><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;background-color:transparent;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Canonical instruction ordering is done by moving a given instruction as close to the nearest use of its def as possible.</span></div></li></ul></span></div></blockquote></div></blockquote></div></div></div></blockquote></span><div>From glancing at the patch this only seems to respect vreg dependencies but I couldn't see memory / side effect tracking (ScheduleDAGInstrs may be the right tool for the job).</div><span class=""><br></span></div></div></blockquote><div><br></div><div>I've currently not gotten to that. At the moment, I treat memory and physical register side effects as canonicalization barriers. Doing a side effect tracker sounds like a good next step. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><span class=""><blockquote type="cite"><div><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><span id="m_6926327527829593398m_2148333917850378115m_2452468905233419516gmail-docs-internal-guid-44256f1b-e740-950e-75a8-4282e27328f5"><br><ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;background-color:transparent;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline"><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;background-color:transparent;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Next, canonical VReg renaming is done by building a collection of candidate instructions that can be thought of as sinks in the def-use graph: they are typically instructions that write to physical registers or store to memory. These candidates are used as the root of a breadth first walk over the vreg operand def-use graph that determines a canonical vreg ordering.</span></div></li></ul><br><ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;background-color:transparent;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline"><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;background-color:transparent;font-variant-ligatures:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Using said canonical vreg ordering we rename monotonically, but before we do this we skip several vreg values in order to increase the chance that we land on the same vreg number for two different input MIR files. We also do this to reduce the chances that a difference in previously def-use walks will affect the vreg renaming for subsequent walks. This skipping step could be thought of as a kind of vreg number reckoning: we skip modulo n vregs so that we are likely to land on the same vreg for two different files.</span></div></li></ul></span></div></blockquote></div></blockquote></div></div></div></blockquote></span><div>It may be interesting to extend the .mir file format here to allow arbitrary names for vregs.</div><div><br></div><div>So instead of</div><div><br></div><div>block1:</div><div>%0 = foo</div><div>%1 = bar</div><div>   use %1</div><div><br></div><div>block2:</div><div>%100 = baz</div><div><br></div><div>you could do something along the lines of:</div><div><br></div><div>block1:</div><div>  %1_foo0 = foo</div><div>  %1_bar1 = bar</div><div>         use %1_bar1</div><div><br></div><div>block2:</div><div>  %2_baz = baz</div><div><br></div><div>That way you could avoid the vreg numbering games and by encoding the instruction type you are even robust against insertion/removal of instructions of different types.</div><div><br></div><div>I'd certainly be open to some mir extension patches.</div></div></div></blockquote><div><br></div><div><br></div><div>That would actually be pretty great because currently the compile time suffers a lot from generating all those skipped vregs.</div><div>I will certainly follow up with you as a second phase of this in order to speed things up.</div><div><br></div><div><br></div><div>I will send a revised patch to the commits list as soon as I can. </div><div><br></div><div>PL</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><span class="HOEnZb"><font color="#888888"><div><br></div><div>- Matthias</div><div><br></div></font></span></div></div></blockquote></div><br></div></div>