[LLVMdev] [Proposal] Parallelize post-IPO stage.

Tue Jul 16 11:14:34 PDT 2013

On 7/16/13 10:40 AM, Evan Cheng wrote:
> On Jul 16, 2013, at 10:37 AM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>
>> On 7/16/13 5:23 AM, Evan Cheng wrote:
>>> Thanks for the proposal. This is important work which is one step towards making LTO more applicable for large applications. Some comments inline.
>>>
>>> On Jul 12, 2013, at 3:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>>>
>>>> 3.1.1 Figure out Partition scheme
>>>> ----------------------------------
>>>>   we randomly pick up some function and put them in a partition.
>>>> It would be nice to perform some optimization at this moment. One opt
>>>> in my mind is to reorder functions in order to reduce working-set and
>>>> improve locality.
>>>>
>>>>   Unfortunately, this opt seems to be bit blind at this time, because
>>>>     - CallGraph is not annotated with estimated or profiled frequency.
>>>>     - some linkers don't respect the order. It seems they just
>>>>       remembers the function order of the pristine input obj/fake-obj,
>>>>       and enforce this order at final link (link-exec/shared-lib) stage.
>>>>
>>>>   Anyway, I try to ignore all these problems, and try to perform partition
>>>> via following steps. Maybe we have some luck on some platforms:
>>>>
>>>>   o. DFS the call-graph, ignoring the self-resursive edges, if freq is
>>>>      available, prioritizing the edges (i.e. corresponding to call-sites)
>>>>      such that frequent edges are visited first.
>>>>
>>>>   o. Cut the DFS spanning tree obtained from the previous step bottom-up,
>>>>      Each cut/partition contains reasonable # of functions, and the aggregate
>>>>      size of the functions of the partition should not exceeds predefined
>>>>      threshold.
>>> I'd like to see more details about this step. How do you determine the "reasonable # of functions"? How do you define the threshold? It has to be the same for a given target / platform regardless of the configuration of the build machine, right?
>> Say, each module should not contains :
>>     - no more than 100 functions,  and
>>     - the total size of the functions in a partition should not exceed the pre-defined threshold,
>>
>> These threshold can be override by command line.
> But how do you come about the thresholds? And are they fixed thresholds or determined based on analysis of function size, etc.?

Yes, just count the total # of instructions.
I don't know which threshold is "comfortable" at this moment. We can 
keep changing it
until we feel comfortable with it.