[PATCH] D42212: [ThinLTO] Add call edges' relative block frequency to per-module summary.

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 19 20:50:28 PST 2018


On Fri, Jan 19, 2018 at 5:42 PM, Easwaran Raman <eraman at google.com> wrote:

>
>
> On Fri, Jan 19, 2018 at 5:33 PM, Teresa Johnson <tejohnson at google.com>
> wrote:
>
>>
>>
>> On Fri, Jan 19, 2018, 4:28 PM Easwaran Raman <eraman at google.com> wrote:
>>
>>> On Fri, Jan 19, 2018 at 4:23 PM, Teresa Johnson <tejohnson at google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Jan 19, 2018 at 4:15 PM, Easwaran Raman <eraman at google.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 19, 2018 at 4:08 PM, Teresa Johnson via Phabricator <
>>>>> reviews at reviews.llvm.org> wrote:
>>>>>
>>>>>> tejohnson added inline comments.
>>>>>>
>>>>>>
>>>>>> ================
>>>>>> Comment at: include/llvm/IR/ModuleSummaryIndex.h:58
>>>>>>    HotnessType Hotness = HotnessType::Unknown;
>>>>>> +  uint64_t RelBlockFreq = 0;
>>>>>>
>>>>>> ----------------
>>>>>> eraman wrote:
>>>>>> > tejohnson wrote:
>>>>>> > > Do we need 64 bits in practice? Especially since these are scaled
>>>>>> down by the entry freq. Concerned about the size increase for large indices
>>>>>> that have huge numbers of edges. Also, since we'll never have both the
>>>>>> Hotness and the RelBlockFreq, it would be great if we could union them,
>>>>>> although that would require a flag on the index and I suppose we might want
>>>>>> to support mixing modules built with and without profile data? Perhaps some
>>>>>> other way to combine them. I.e. use a single field and reserve the lowest 3
>>>>>> bits for the hotness, shifting the relative frequency into the upper n-3
>>>>>> bits...
>>>>>> > From our offline discussions I thought the use of 64 bits doesn't
>>>>>> matter as they get compressed. Is the concern about in-memory size? I can
>>>>>> reduce the size to, say, 29 bits (uint32_t with 3 bits reserved for the
>>>>>> enum). Another thing to note is that only the per-module summary has this
>>>>>> information.  Is there a way to make use of this?
>>>>>> Right - concerned about in-memory size. Why does only the per-module
>>>>>> summary contain this? Isn't that just temporary - I thought the idea was to
>>>>>> propagate this into the backends?
>>>>>>
>>>>> We want to propagate the synthetic function entry counts to the
>>>>> backends. This can be done by building a global graph whose edges are
>>>>> annotated with the relative BFI and propagating the counts on this graph.
>>>>>
>>>>
>>>> Can you clarify - to get them to the backends the updated propagated
>>>> counts need to be written into the index (which is serialized out for
>>>> distributed backends, and passed in-memory for in-process backends). So I
>>>> would imagine this field would be used for that purpose?
>>>>
>>> Yes, the propagated function entry counts needs to be written into the
>>> index, but that is  *per-function*  and not *per-edge*. On the other hand,
>>> relative BFI is per call edge.
>>>
>>
>> Ah ok. But at least initially, they will need to be on the edges in the
>> combined index?. I.e. we will read in and merge all of the summary indices
>> into a combined index at the start of the thin link, and that will have the
>> relative frequencies on the edges, then you will do a graph propagation,
>> right? I'm still not sure why they would only be on the per module index in
>> memory.
>>
> You're right. And depending on how the callgraph is maintained, the
> frequencies might be kept in two places in memory (global summary and the
> callgraph) which only supports your point about memory usage.
>

The Index actually contains a full call graph, we should avoid building a
new data structure with a copy of the info. I see in your patch D42311 that
we would be building a separate CallGraph data structure. It would be good
to use the summaries as nodes in a graph so we don't have to duplicate the
information. See the approach in D36311 (see D36850 for a use that operates
on SCCs). Can you use that instead?


>
>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> ================
>>>>>> Comment at: lib/Analysis/ModuleSummaryAnalysis.cpp:63
>>>>>> +cl::opt<bool> WriteRelBFToSummary(
>>>>>> +    "write-relbf-to-summary", cl::Hidden, cl::init(false),
>>>>>> +    cl::desc("Write relative block frequency to function summary "));
>>>>>> ----------------
>>>>>> eraman wrote:
>>>>>> > tejohnson wrote:
>>>>>> > > Suggest moving this to the bitcode writer and always adding
>>>>>> RelBlockFreq into the in-memory index here.
>>>>>> > >
>>>>>> > > Is there any reason to default off? Why not just default on from
>>>>>> the get-go?
>>>>>> > Until this information actually gets used (first propagate the
>>>>>> counts and then use the counts in the inliner), this just poses extra
>>>>>> compile time and increased summary size if not turned off. Alternatively, I
>>>>>> can drop this patch and then send it as part of a larger patch that builds
>>>>>> the callgraph and propagates the counts.
>>>>>> How long do you expect until the other patches to be sent? If not too
>>>>>> long I don't know that it matters too much since we are going to incur the
>>>>>> extra space/memory/time in the near term anyway. Better to have it tested.
>>>>>>
>>>>>>
>>>>>> Repository:
>>>>>>   rL LLVM
>>>>>>
>>>>>> https://reviews.llvm.org/D42212
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>>> 408-460-2413 <(408)%20460-2413>
>>>>
>>>
>


-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180119/7576b012/attachment.html>


More information about the llvm-commits mailing list