[llvm] r189297 - Add new API lto_codegen_compile_parallel().

Tue Sep 3 14:55:57 PDT 2013

On 9/3/13 9:07 AM, Eric Christopher wrote:
> On Tue, Aug 27, 2013 at 10:42 AM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>> Revert in 189386.  Once again, I apologize  I don't follow the canonical
>> procedure.
>> I personally think Nick's proposal is clean enough for our system, and take
>> for granted
>> the community will like it.
>>
> It's not necessarily bad, but the lto library is a bit funky and
> perhaps a new lto library is what we need :)

Basically, I think the LTO lib is in bad shape compared to other 
LTO/IPO/IPA stuff
I worked before.

We certainly need a new LTO. But we have to get that point one step a time.

    I'm not mad at that LTO is provided as lib. But I'm really 
uncomfortable with its interface.
I take it for granted that linker should be completely decoupled from 
compiler (i.e.
the LTO lib), and linker and compiler should talk using symbol-related 
functions instead
of LTO-control related function (like the lto_xxx() APIs). Unfortunately 
people in this list
think my point is pretty crappy:-).

   I could be wrong, but I constantly feel the pain and will still feel 
the pain
if the linker and compiler still communicate with the lto_xxx() APIs -- 
we constantly need to
add new APIs, and in the mean time maintain the old APIs, the linker 
side need to change,
and compiler side need to change as well, ...

   Anyway, this is far beyond my scope. I hope linker folks & other 
awesome folks will
figure out a more comfortable way to go.

>
>> I will not initiate a discussion for now. I'd like to cool things down for a
>> while. (maybe postpone indefinitely).
>>
>> As with most infrastructure related project, partition is an unglamorous and
>> pain-taking work.
>> I step forward to take it just because we are almost have no way debug or
>> investigate LTO.
>>
> Absolutely.
>
>> For those who is curious about how much we can speedup by partition.
>> Unfortunately, I can't tell
>> as the project is not yet completely done. My rudimentary (quite stupid
>> actually)
>> implementation using make-utility speedup the command "clang++ Xalancbmk/*.o
>> -flto"
>> by 39%. (35s vs 21s, Xalancbmk has 700+ input).  It is bit shame for
>> partition. But at very least, each partition
>> is under human control.  On the other hand,  post-IPO scalar-optimization is
>> not yet parallelizied
>> in my rudimentary implementation. (i.e. so far only parallelize the codegen
>> part). Surprisingly,
>> the result is very consistent with what Xiaofei achieve via multh-threading
>> code-gen.  As far
>> as I can recall, he speedup some 2.9x. In my case, it take about 13s before
>> code-gen starts.
>> Meaning the speedup to the code-gen is about (35-13)/(21-13) = 2.75x.
>> (Code-gen plus linker's post-processing take 35-13s).
>>
> Interesting. What's the difference (or your opinion) here between,
> say, parallelizing codegen/post-ipo passes and splitting the module?
> Why go for the second rather than the first?
>
>

     Parallelizing CodeGen using multi-thread has lots of limitations :
    1) Currently, the CodeGen implementation is not necessarily 
thread-safe.
    2) it make pass-manager even more complicated.
    3) it work for CodeGen only. It's not applicable to Scalar-Opt(i.e. 
IR opt) which
        need to deal SCC in call-graph.

    In contrast, partition has lots of advantage:
     1) it divide the program into human-controllable partitions, 
significant ease
         LTO trouble-shooting.
     2) It can parallelize *all* post-IPO compiler pipeline, including 
CodeGen,
         Scalar-Opt, and auto-par/Loop-nest-opt etc.