[PATCH] D42212: [ThinLTO] Add call edges' relative block frequency to per-module summary.

Fri Jan 19 17:42:20 PST 2018

On Fri, Jan 19, 2018 at 5:33 PM, Teresa Johnson <tejohnson at google.com>
wrote:

>
>
> On Fri, Jan 19, 2018, 4:28 PM Easwaran Raman <eraman at google.com> wrote:
>
>> On Fri, Jan 19, 2018 at 4:23 PM, Teresa Johnson <tejohnson at google.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Jan 19, 2018 at 4:15 PM, Easwaran Raman <eraman at google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Jan 19, 2018 at 4:08 PM, Teresa Johnson via Phabricator <
>>>> reviews at reviews.llvm.org> wrote:
>>>>
>>>>> tejohnson added inline comments.
>>>>>
>>>>>
>>>>> ================
>>>>> Comment at: include/llvm/IR/ModuleSummaryIndex.h:58
>>>>>    HotnessType Hotness = HotnessType::Unknown;
>>>>> +  uint64_t RelBlockFreq = 0;
>>>>>
>>>>> ----------------
>>>>> eraman wrote:
>>>>> > tejohnson wrote:
>>>>> > > Do we need 64 bits in practice? Especially since these are scaled
>>>>> down by the entry freq. Concerned about the size increase for large indices
>>>>> that have huge numbers of edges. Also, since we'll never have both the
>>>>> Hotness and the RelBlockFreq, it would be great if we could union them,
>>>>> although that would require a flag on the index and I suppose we might want
>>>>> to support mixing modules built with and without profile data? Perhaps some
>>>>> other way to combine them. I.e. use a single field and reserve the lowest 3
>>>>> bits for the hotness, shifting the relative frequency into the upper n-3
>>>>> bits...
>>>>> > From our offline discussions I thought the use of 64 bits doesn't
>>>>> matter as they get compressed. Is the concern about in-memory size? I can
>>>>> reduce the size to, say, 29 bits (uint32_t with 3 bits reserved for the
>>>>> enum). Another thing to note is that only the per-module summary has this
>>>>> information.  Is there a way to make use of this?
>>>>> Right - concerned about in-memory size. Why does only the per-module
>>>>> summary contain this? Isn't that just temporary - I thought the idea was to
>>>>> propagate this into the backends?
>>>>>
>>>> We want to propagate the synthetic function entry counts to the
>>>> backends. This can be done by building a global graph whose edges are
>>>> annotated with the relative BFI and propagating the counts on this graph.
>>>>
>>>
>>> Can you clarify - to get them to the backends the updated propagated
>>> counts need to be written into the index (which is serialized out for
>>> distributed backends, and passed in-memory for in-process backends). So I
>>> would imagine this field would be used for that purpose?
>>>
>> Yes, the propagated function entry counts needs to be written into the
>> index, but that is  *per-function*  and not *per-edge*. On the other hand,
>> relative BFI is per call edge.
>>
>
> Ah ok. But at least initially, they will need to be on the edges in the
> combined index?. I.e. we will read in and merge all of the summary indices
> into a combined index at the start of the thin link, and that will have the
> relative frequencies on the edges, then you will do a graph propagation,
> right? I'm still not sure why they would only be on the per module index in
> memory.
>
You're right. And depending on how the callgraph is maintained, the
frequencies might be kept in two places in memory (global summary and the
callgraph) which only supports your point about memory usage.

>
>>>
>>>>
>>>>>
>>>>>
>>>>> ================
>>>>> Comment at: lib/Analysis/ModuleSummaryAnalysis.cpp:63
>>>>> +cl::opt<bool> WriteRelBFToSummary(
>>>>> +    "write-relbf-to-summary", cl::Hidden, cl::init(false),
>>>>> +    cl::desc("Write relative block frequency to function summary "));
>>>>> ----------------
>>>>> eraman wrote:
>>>>> > tejohnson wrote:
>>>>> > > Suggest moving this to the bitcode writer and always adding
>>>>> RelBlockFreq into the in-memory index here.
>>>>> > >
>>>>> > > Is there any reason to default off? Why not just default on from
>>>>> the get-go?
>>>>> > Until this information actually gets used (first propagate the
>>>>> counts and then use the counts in the inliner), this just poses extra
>>>>> compile time and increased summary size if not turned off. Alternatively, I
>>>>> can drop this patch and then send it as part of a larger patch that builds
>>>>> the callgraph and propagates the counts.
>>>>> How long do you expect until the other patches to be sent? If not too
>>>>> long I don't know that it matters too much since we are going to incur the
>>>>> extra space/memory/time in the near term anyway. Better to have it tested.
>>>>>
>>>>>
>>>>> Repository:
>>>>>   rL LLVM
>>>>>
>>>>> https://reviews.llvm.org/D42212
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>> 408-460-2413 <(408)%20460-2413>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180119/19d0edc4/attachment.html>