[LLVMdev] [Proposal] Parallelize post-IPO stage.

Tue Jul 16 13:49:30 PDT 2013

I have actually came up the 3 approaches to build the post-ipo object 
independently.

The "3rd approach" here is the 1st solution in my original proposal. 
Almost all coworkers call it sucks:-)
Now I accept it because the it has no way to be adaptive.

Consider the scenario we compile the llvm compiler. We use "make -j16" for
computer with 8 processor, each make-thread invoke a compiler which may 
blindly invoke 16 threads!
So, we end up to have 16*16 threads.

Being adaptive will render it possible to pick up right factor 
judiciously and adpatively.

In any case, I will support this approach (i.e. the 3rd approach you 
mentioned) at very least at beginning.

On 7/16/13 1:35 PM, Xinliang David Li wrote:
> A third approach is to decouple the backend compilation and
> parallelism strategy from the partitioning.  The partitioning can
> spits out partition BC files and some action records in some standard
> format. All of this can be fed into some driver tools that converts
> the compilation action file into make/build file of the underlying
> build system of your choice:
>
> 1) it can simply a compiler driver that does thread level parallelism;
> 2) or a tool that generates Makfiles which are fed into parallel make
> to explore single node parallelism;
> 3) or a tool that generates BUILD files that feed into distributed
> build system (such as Google's blaze:
> http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html)
>
> Another benefit is it will make compiler debugging easier.
>
> thanks,
>
> David
>
> On Sun, Jul 14, 2013 at 5:56 PM, Andrew Trick <atrick at apple.com> wrote:
>> On Jul 12, 2013, at 3:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>>
>> 3.2 Compile partitions independently
>> --------------------------------------
>>
>>    There are two camps: one camp advocate compiling partitions via
>> multi-process,
>> the other one favor multi-thread.
>>
>>   Inside Apple compiler teams, I'm the only one belong to the 1st comp. I
>> think
>> while multi-proc sounds bit red-neck, it has its advantage for this purpose,
>> and
>> while multi-thread is certainly more eye-popping, it has its advantage
>> as well.
>>
>>   The advantage of multi-proc are:
>>   1) easier to implement, the process run in its own address space.
>>     We don't need to worry about they can interfere with each other.
>>
>>   2)huge, or not unlimited, address space.
>>
>>    The disadvantage is that it's expensive. But I guess the cost is
>>   almost negligible compared to the overall IPO compilation.
>>
>>   The advantage of multi-threads I can imagine are:
>>    1) sound fancy
>>    2) it is light-weight
>>    3) inter-thread communication is easier than IPC.
>>
>>   Its disadvantage are:
>>    1). Oftentime we will come across race-condition, and it took
>>       awful long time to figure it out. While the code is supposed
>>       to be mult-thread safe, we might miss some tricky case.
>>       Trouble-shooting race condition is a nightmare.
>>
>>    2) Small address space. This is big problem if we the compiler
>>       is built 32-bit . In that case, the compiler is not able to bring
>>       lots of stuff in memory even if the HW dose
>>       provide ample mem.
>>
>>    3) The thread-safe run-time lib is more expensive.
>>       I once linked a compiler using -lpthread (I dose not have to) on a
>>       UNIX platform,  and saw the compiler slow down by about 1/3.
>>
>>     I'm not able to convince the folks in other camp, neither are they
>> able to convince me. I decide to implement both. Fortunately, this
>> part is not difficult, it seems to be rather easy to crank out one within
>> short
>> period of time. It would be interesting to compare them side-by-side,
>> and see which camp lose:-). On the other hand, if we run into race-condition
>> problem, we choose multi-proc version as a fall-back.
>>
>>
>> While I am a self-proclaimed multi-process red-neck, in this case I would
>> prefer to see a multi-threaded implementation because I want to verify that
>> LLVMContext can be used as advertised. I'm sure some extra care will be
>> needed to report failures/diagnostics, but we should start with the
>> assumption that this approach is not significantly harder than multi-process
>> because that's how we advertise the design.
>>
>> If any of the multi-threaded disadvantages you point out are real, I would
>> like to find out about it.
>>
>> 1. Race Conditions: We should be able to verify that the thread-parallel vs.
>> sequential or multi-process compilation generate the same result. If they
>> diverge, we would like to know about the bug so it can be fixed--independent
>> of LTO.
>>
>> 2. Small Address Space with LTO. We don't need to design around this
>> hypothetical case.
>>
>> 3. Expensive thread-safe runtime lib. We should not speculate that platforms
>> that we, as the LLVM community, care about have this problem. Let's assume
>> that our platforms are well implemented unless we have data to the contrary.
>> (Personally, I would even love to use TLS in the compiler to vastly simplify
>> API design in the backend, but I am not going to be popular for saying so).
>>
>> We should be able to decompose each step of compilation for debugging. So
>> the multi-process "implementation" should just be a degenerate form of
>> threading with a bit of driver magic if you want to automate it.
>>
>> -Andy
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>