<div dir="ltr">Oops... pressed the wrong button and sent out early...<div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 1, 2016 at 2:01 PM, Dehao Chen <span dir="ltr"><<a href="mailto:dehao@google.com" target="_blank">dehao@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">If Hal's proposal is for SamplePGO purpose, let me clarify some design principles of SamplePGO.<div><br></div><div>The profile for sample pgo uses source location as the key to map the execution count back to IR. This design is based on the principle that we do not want the profile to be tightly couple with compiler IR. Instead, profile is simple an attribute of the source code. We have been benefited a lot from this design that the profile can easily be reused across different source versions and compiler versions, or even compilers.</div><div><br></div><div>That being said, the design to encode more info into discriminator does not mean that we will change the profile. The encoded info in discriminator will be handled by the create_llvm_prof tool, which combines counts from different clones of the same source code and generate the combined profile data. The output profile will not have any cloning/dupliaction bits at all. So for the initial example profile I provided, the output profile will be:</div></div></blockquote><div><br></div><div style="font-size:12.8px">#1: 10</div><div><span style="font-size:12.8px">#3: 80</span></div><div><br></div><div>Not:</div><div><br></div><div>#1: 10</div><div>#3.0x400: 70<br></div><div>#3.0x10400: 5</div><div>#3.0x20400: 3</div><div>#3.0x30400: 2<br></div><div><br></div><div>The goal of the proposed change, is to make profile more accurately represent the attribute of the source.</div><div>The non-goal of the proposed change, is to provide more context in the profile to present the behavior of program in the context of different context.</div><div><br></div><div>If we are pursuing more context-sensitive profile, that would be a very different design. In this design, we will need to read profile multiple times, and have the profile tightly coupled with the compiler IR/pass manager. That is doable, but I don't think that probably better suits instrumentation based PGO's domain. Comments?</div><div><br></div><div>Thanks,</div><div>Dehao</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><br></div></div><div class="gmail-HOEnZb"><div class="gmail-h5"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 1, 2016 at 1:04 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:rgb(0,0,0)"><br><hr id="gmail-m_22964723159565198m_8666766791862472595zwchr"><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:helvetica,arial,sans-serif;font-size:12pt"><b>From: </b>"Paul Robinson" <<a href="mailto:paul.robinson@sony.com" target="_blank">paul.robinson@sony.com</a>><br><b>To: </b>"Dehao Chen" <<a href="mailto:dehao@google.com" target="_blank">dehao@google.com</a>>, "Hal Finkel" <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>><br><b>Cc: </b>"Xinliang David Li" <<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>>, <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br><b>Sent: </b>Tuesday, November 1, 2016 2:15:38 PM<br><b>Subject: </b>RE: [llvm-dev] (RFC) Encoding code duplication factor in discriminator<span><br><br>
<div class="gmail-m_22964723159565198m_8666766791862472595WordSection1">
<p class="MsoNormal" style="margin-left:1in"><span style="font-family:helvetica,sans-serif;color:black">As illustrated in the above example, it is not like "vectorization has a distinct bit". All different optimizations make clones of code which will
be labeled by UIDs represented by N (e.g. 8) bits. In this way, the space will be capped by the number of clones all optimizations have made, instead of # of optimizations that has applied. And it will be capped at 2^N-1. The cons of using uid is that you
will not know if a clone is coming from vectorization or unroll or loop distribution.</span></p>
<p class="MsoNormal" style="margin-left:0.5in"><span style="font-size:10pt;font-family:arial,sans-serif;color:black">Okay, but that kind of semantic mapping is important. How should we encode/recover that information? To be clear, I'm not saying that we
need to implement that up front, but there needs to be a clear path to an implementation, because I don't want to have two disjoint schemes.</span></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal" style="margin-left:5.35pt">You mean that you want to know which optimization created the clone? How would you use that info? Looks to me this will expose compiler implementation detail in debug info.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal" style="margin-left:5.35pt">This is still doable, assume we have 15 interesting optimizations to track, we can use 4 bits to encode the optimization type that created the clone. But this becomes nasty if the a clone is created by more than
one optimizations. In that way, discriminator may not be fit for this purpose.</p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal"><a name="m_22964723159565198_m_8666766791862472595__MailEndCompose"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">My understanding was that the encoding scheme would allow the profiling analysis to correctly map execution data back to the original
source construct, while preserving the property that each distinct basic block would have its own discriminator value. That is, the execution data would be attributed back to the original source construct, not whatever each individual optimization had done
to it, and the data for the original source construct would correctly reflect the execution (e.g. profiling says you got 82 hits on the original loop, rather than reporting 20 hits on the unrolled-by-4 loop plus 1 each on 2 of the trailing copies).</span></a></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">It sounds like Hal is thinking that the per-discriminator execution info would be preserved down to the point where an individual optimization could look at
the profile for each piece, and make decisions on that basis.</span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">I'm not clear how that would be possible, as the optimization would have to first do the transform (or predict how it would do the transform) in order to see
which individual-discriminator counts mapped to which actual blocks, and then make some kind of decision about whether to do the transform differently based on that information. Then, if the optimization did choose to do the transform differently, then that
leaves the IR in a state where the individual discriminators *cannot* map back to it. (Say you unroll by 2 instead of 4; then you have only 1 trailing copy, not 3, and a discriminator that maps to the second trailing copy now maps to nothing. The individual-discriminator
data becomes useless.)</span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal"><span id="gmail-m_22964723159565198m_8666766791862472595DWT9585" style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Am I expressing this well enough to show that what Hal is looking for is not feasible?</span></p></div></span></blockquote>Yes, it will need to predict how the transformation would affect the blocks produced. That does not seem problematic (at least at a coarse level). Yes, if transformations made earlier in the pipeline make different decisions, then that will invalidate later fine-grained data (at least potentially). I don't see how any of this makes this infeasible. We just need a way for the profiling counts, per descriminator, to remain available, and for the transformations themselves to know which discriminators (loop ids, or whatever) to consider.<br><br> -Hal<br><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:helvetica,arial,sans-serif;font-size:12pt"><div class="gmail-m_22964723159565198m_8666766791862472595WordSection1"><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">--paulr</span></p>
<div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;padding:0in 0in 0in 4pt">
<div>
<div>
<p class="MsoNormal"> </p>
</div>
</div>
</div>
</div>
</blockquote><span><br><br><br>-- <br><div><span name="x"></span>Hal Finkel<br>Lead, Compiler Technology and Programming Languages<br>Leadership Computing Facility<br>Argonne National Laboratory<span name="x"></span><br></div></span></div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div></div>