[PATCH] D12781: PGO IR-level instrumentation infrastructure

Wed Sep 23 18:02:20 PDT 2015

On Wed, Sep 23, 2015 at 4:39 PM, Sean Silva <chisophugis at gmail.com> wrote:

>
>
> On Wed, Sep 23, 2015 at 3:28 PM, Rong Xu <xur at google.com> wrote:
>
>>
>>
>> On Tue, Sep 22, 2015 at 6:09 PM, Sean Silva <chisophugis at gmail.com>
>> wrote:
>>
>>> silvas added a reviewer: kcc.
>>> silvas added a comment.
>>>
>>> This should not be called "late instrumentation". That is clang-specific
>>> terminology. Think from the perspective of a random frontend (say, one that
>>> doesn't have any other PGO mechanism). There should be no mention of "FE
>>> instrumentation", "late instrumentation" "IR-level instrumentation", etc.
>>> This is just "edge profiling" or something like that.
>>>
>>
>> these terms are mainly for the reviewers to distinguish the two
>> instrumentation. I'll clean up the code and comments to use "edge
>> profiling".
>>
>
> Thanks.
>
>
>>
>>> A couple other high-level questions/comments:
>>>
>>> - is there a reason that this is breaking critical edges itself instead
>>> of using break-crit-edges pass? Is break-crit-edges dead? SanitizerCoverage
>>> doesn't seem to use it either.
>>>
>>
>> As I mentioned in the email to Kostya: this is to avoid unnecessary
>> split. We only need to split a small set of the critical edges.
>>
>>
>>> - why is this a module pass? It operates entirely on a
>>> function-by-function basis.
>>>
>> Good question. When I started the implementation, I thought I would
>> create global functions and global vars that needed to be a module pass.
>> But now it seems function pass should work. I'll change this in a later
>> patch.
>>
>> - can we use GraphTraits for the MST algorithm, so that it is reusable?
>>> (could probably be a separate patch if so).
>>>
>> This can be done in the future in a later patch. Not sure what other pass
>> would use MST.
>>
>
> At the very least, it may help to be able to unittest the MST algorithm.
> Your current patch has no testing for this; how are you sure that your MST
> algorithm is correct? (actually, this patch has no tests at all; please add
> testing to the patch once we have converged on the final design).
>
>
>>
>>
>>> - could we break the PGO data reading into a separate analysis? Also
>>> could the pass be split into more fine-grained passes? (I think the answer
>>> is yes to both questions)
>>>
>>
>> I would prefer to put instrumentation and reading profile (read the count
>> out and populate to each BB) in the same pass. This two passes need to see
>> the matched IR. Separating them will create more changes of mismatches.
>>
>
> The problem is that now we have a pass that really is two separate passes.
> If we look in PGOLateInstrumentationFunc:: PGOLateInstrumentationFunc, we
> can see that there is a shared block of code:
>
>     setFuncName(F);
>     buildEdges();
>     sortEdges();
>     computeMinimumSpanningTree();
>     computeCFGHash();
>     dumpEdges();
>
> Then the pass does two radically different things:
>
>     if (!IsUse) {
>       instrumentCFG();
>       return;
>     }
>
>     if (readCounters(Filename)) {
>       populateCounters();
>       setBranchWeights();
>       dumpEdges(true);
>     }
>
> The shared block of code seems like it could be pulled out into a helper
> class that can be shared by the two passes.
>
> Precisely because the two different operations are so tightly coupled, we
> have to clearly separate the single point of truth that they both rely on.
> Otherwise it is difficult to reason about this close coupling. This piece
> of functionality should be clearly separated (separate class/function; a
> method is insufficient isolation because then there can be accidental
> interdependence due to member variables).
>
> Also, in order to fully benefit from the MST, we will need to be able to
> read the counters earlier (but if I understand correctly, don't need to set
> the branch weight metadata). This indicates that, roughly speaking,
> `populateCounters()` should be clearly separated.
>
> Even if we don't expose two different passes (for the reason you
> described), internally in the implementation there has to be a clear
> separation. I'm still not convinced that we shouldn't expose two different
> passes, but it should be easy to make that choice later once the internal
> implementation is cleanly factored.
>
>
>> But I could move some of the code, like setting function attributes into
>> a separated pass.
>>
>>
>>> - We should directly accept a buffer containing the profile feedback
>>> data instead of a file name. That lets the frontend do the error handling,
>>> and is more flexible for usage as a library.
>>>
>>
>> Good suggestion. I'll work on this.
>>
>
> Now that I look at the code again, instead of a raw buffer, it probably
> makes sense for the frontend to pass in an IndexedInstrProfReader directly.
> The idea is that as many "error conditions" as possible should be pushed up
> into the frontend.
>

Also, the current patch is reading the profile file O(#functions) times
(and reading the file is O(#functions) work, leading to quadratic run
time). Clearly this has to be avoided, so merely passing in the contents of
the profile file is an insufficient solution. Something like passing in an
IndexedInstrProfReader (or possibly your own class, which does further
preprocessing on the data from an IndexedInstrProfReader) will be necessary.

-- Sean Silva

>
> -- Sean Silva
>
>
>>
>>
>>>
>>> Adding Kostya, as this has a lot of similarities to sanitizer coverage;
>>> also just for general knowledge of instrumentation passes.
>>>
>>>
>>> ================
>>> Comment at: lib/Transforms/Instrumentation/PGOLateInstr.cpp:122-126
>>> @@ +121,7 @@
>>> +
>>> +  // Threshold of the hot functions.
>>> +  const float HotFunctionThreshold = 0.01;
>>> +
>>> +  // Threshold of the cold functions.
>>> +  const float ColdFunctionThreshold = 0.0002;
>>> +
>>> ----------------
>>> Use integers.
>>>
>>> ================
>>> Comment at: lib/Transforms/Instrumentation/PGOLateInstr.cpp:301-302
>>> @@ +300,4 @@
>>> +
>>> +// Compute the CRC32 of unsigned integers.
>>> +unsigned PGOLateInstrumentationFunc::crc32UnsignedBits(unsigned Chksum,
>>> +                                                       unsigned Value,
>>> ----------------
>>> Why crc32?
>>>
>>> ================
>>> Comment at: lib/Transforms/Instrumentation/PGOLateInstr.cpp:313-314
>>> @@ +312,4 @@
>>> +
>>> +// Compute Hash value for the CFG: the lower 32 bits are CRC32 of the
>>> index
>>> +// value of each BB in the CFG. The higher 32 bits record the number of
>>> edges.
>>> +void PGOLateInstrumentationFunc::computeCFGHash() {
>>> ----------------
>>> This seems like a very poor use of the bits of the hash. Why not use md5
>>> like clang's `PGOHash`?
>>>
>>>
>>> http://reviews.llvm.org/D12781
>>>
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150923/f4731725/attachment.html>