[PATCH] D12781: PGO IR-level instrumentation infrastructure

Wed Sep 23 17:49:00 PDT 2015

On Wed, Sep 23, 2015 at 4:51 PM, Xinliang David Li <davidxl at google.com>
wrote:

>
>
> On Wed, Sep 23, 2015 at 4:39 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Wed, Sep 23, 2015 at 3:28 PM, Rong Xu <xur at google.com> wrote:
>>
>>>
>>>
>>> On Tue, Sep 22, 2015 at 6:09 PM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>>
>>>> silvas added a reviewer: kcc.
>>>> silvas added a comment.
>>>>
>>>> This should not be called "late instrumentation". That is
>>>> clang-specific terminology. Think from the perspective of a random frontend
>>>> (say, one that doesn't have any other PGO mechanism). There should be no
>>>> mention of "FE instrumentation", "late instrumentation" "IR-level
>>>> instrumentation", etc. This is just "edge profiling" or something like that.
>>>>
>>>
>>> these terms are mainly for the reviewers to distinguish the two
>>> instrumentation. I'll clean up the code and comments to use "edge
>>> profiling".
>>>
>>
>> Thanks.
>>
>>
>>>
>>>> A couple other high-level questions/comments:
>>>>
>>>> - is there a reason that this is breaking critical edges itself instead
>>>> of using break-crit-edges pass? Is break-crit-edges dead? SanitizerCoverage
>>>> doesn't seem to use it either.
>>>>
>>>
>>> As I mentioned in the email to Kostya: this is to avoid unnecessary
>>> split. We only need to split a small set of the critical edges.
>>>
>>>
>>>> - why is this a module pass? It operates entirely on a
>>>> function-by-function basis.
>>>>
>>> Good question. When I started the implementation, I thought I would
>>> create global functions and global vars that needed to be a module pass.
>>> But now it seems function pass should work. I'll change this in a later
>>> patch.
>>>
>>> - can we use GraphTraits for the MST algorithm, so that it is reusable?
>>>> (could probably be a separate patch if so).
>>>>
>>> This can be done in the future in a later patch. Not sure what other
>>> pass would use MST.
>>>
>>
>> At the very least, it may help to be able to unittest the MST algorithm.
>> Your current patch has no testing for this; how are you sure that your MST
>> algorithm is correct? (actually, this patch has no tests at all; please add
>> testing to the patch once we have converged on the final design).
>>
>>
>>>
>>>
>>>> - could we break the PGO data reading into a separate analysis? Also
>>>> could the pass be split into more fine-grained passes? (I think the answer
>>>> is yes to both questions)
>>>>
>>>
>>> I would prefer to put instrumentation and reading profile (read the
>>> count out and populate to each BB) in the same pass. This two passes need
>>> to see the matched IR. Separating them will create more changes of
>>> mismatches.
>>>
>>
>> The problem is that now we have a pass that really is two separate
>> passes. If we look in PGOLateInstrumentationFunc::
>> PGOLateInstrumentationFunc, we can see that there is a shared block of code:
>>
>>     setFuncName(F);
>>     buildEdges();
>>     sortEdges();
>>     computeMinimumSpanningTree();
>>     computeCFGHash();
>>     dumpEdges();
>>
>> Then the pass does two radically different things:
>>
>>     if (!IsUse) {
>>       instrumentCFG();
>>       return;
>>     }
>>
>>     if (readCounters(Filename)) {
>>       populateCounters();
>>       setBranchWeights();
>>       dumpEdges(true);
>>     }
>>
>> The shared block of code seems like it could be pulled out into a helper
>> class that can be shared by the two passes.
>>
>> Precisely because the two different operations are so tightly coupled, we
>> have to clearly separate the single point of truth that they both rely on.
>> Otherwise it is difficult to reason about this close coupling. This piece
>> of functionality should be clearly separated (separate class/function; a
>> method is insufficient isolation because then there can be accidental
>> interdependence due to member variables).
>>
>>
> Moving CfgMst analysis into its own class as utility/help class, and make
> Instrumentation and Profile Annotation two distinct passes sounds
> reasonable.
>
>
>
>> Also, in order to fully benefit from the MST, we will need to be able to
>> read the counters earlier (but if I understand correctly, don't need to set
>> the branch weight metadata). This indicates that, roughly speaking,
>> `populateCounters()` should be clearly separated.
>>
>
> Not sure what you mean here.  populateCounters depend on readCounters ..
>

I'm not sure at what exact boundary things need to be split (hence
"roughly"), but besides the general readability benefits, it's clear that
if we want to eventually support using existing profile data to inform the
MST at some point, this will need to be separated. In other words, with
profile-guided MST, the functionality of roughly `populateCounters()` is
actually shared between Instrumentation and Profile Annotation.

Also, notice that the current patch opens the profile file O(#functions)
times which is unacceptable, so readCounters will need to be heavily
refactored to occur somewhere else anyway.

-- Sean Silva

>
> David
>
>
>>
>> Even if we don't expose two different passes (for the reason you
>> described), internally in the implementation there has to be a clear
>> separation. I'm still not convinced that we shouldn't expose two different
>> passes, but it should be easy to make that choice later once the internal
>> implementation is cleanly factored.
>>
>>
>>> But I could move some of the code, like setting function attributes into
>>> a separated pass.
>>>
>>>
>>>> - We should directly accept a buffer containing the profile feedback
>>>> data instead of a file name. That lets the frontend do the error handling,
>>>> and is more flexible for usage as a library.
>>>>
>>>
>>> Good suggestion. I'll work on this.
>>>
>>
>> Now that I look at the code again, instead of a raw buffer, it probably
>> makes sense for the frontend to pass in an IndexedInstrProfReader directly.
>> The idea is that as many "error conditions" as possible should be pushed up
>> into the frontend.
>>
>> -- Sean Silva
>>
>>
>>>
>>>
>>>>
>>>> Adding Kostya, as this has a lot of similarities to sanitizer coverage;
>>>> also just for general knowledge of instrumentation passes.
>>>>
>>>>
>>>> ================
>>>> Comment at: lib/Transforms/Instrumentation/PGOLateInstr.cpp:122-126
>>>> @@ +121,7 @@
>>>> +
>>>> +  // Threshold of the hot functions.
>>>> +  const float HotFunctionThreshold = 0.01;
>>>> +
>>>> +  // Threshold of the cold functions.
>>>> +  const float ColdFunctionThreshold = 0.0002;
>>>> +
>>>> ----------------
>>>> Use integers.
>>>>
>>>> ================
>>>> Comment at: lib/Transforms/Instrumentation/PGOLateInstr.cpp:301-302
>>>> @@ +300,4 @@
>>>> +
>>>> +// Compute the CRC32 of unsigned integers.
>>>> +unsigned PGOLateInstrumentationFunc::crc32UnsignedBits(unsigned Chksum,
>>>> +                                                       unsigned Value,
>>>> ----------------
>>>> Why crc32?
>>>>
>>>> ================
>>>> Comment at: lib/Transforms/Instrumentation/PGOLateInstr.cpp:313-314
>>>> @@ +312,4 @@
>>>> +
>>>> +// Compute Hash value for the CFG: the lower 32 bits are CRC32 of the
>>>> index
>>>> +// value of each BB in the CFG. The higher 32 bits record the number
>>>> of edges.
>>>> +void PGOLateInstrumentationFunc::computeCFGHash() {
>>>> ----------------
>>>> This seems like a very poor use of the bits of the hash. Why not use
>>>> md5 like clang's `PGOHash`?
>>>>
>>>>
>>>> http://reviews.llvm.org/D12781
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150923/154ed033/attachment.html>