[LLVMdev] [Proposal] Parallelize post-IPO stage.
shuxin.llvm at gmail.com
Tue Jul 16 13:49:30 PDT 2013
I have actually came up the 3 approaches to build the post-ipo object
The "3rd approach" here is the 1st solution in my original proposal.
Almost all coworkers call it sucks:-)
Now I accept it because the it has no way to be adaptive.
Consider the scenario we compile the llvm compiler. We use "make -j16" for
computer with 8 processor, each make-thread invoke a compiler which may
blindly invoke 16 threads!
So, we end up to have 16*16 threads.
Being adaptive will render it possible to pick up right factor
judiciously and adpatively.
In any case, I will support this approach (i.e. the 3rd approach you
mentioned) at very least at beginning.
On 7/16/13 1:35 PM, Xinliang David Li wrote:
> A third approach is to decouple the backend compilation and
> parallelism strategy from the partitioning. The partitioning can
> spits out partition BC files and some action records in some standard
> format. All of this can be fed into some driver tools that converts
> the compilation action file into make/build file of the underlying
> build system of your choice:
> 1) it can simply a compiler driver that does thread level parallelism;
> 2) or a tool that generates Makfiles which are fed into parallel make
> to explore single node parallelism;
> 3) or a tool that generates BUILD files that feed into distributed
> build system (such as Google's blaze:
> Another benefit is it will make compiler debugging easier.
> On Sun, Jul 14, 2013 at 5:56 PM, Andrew Trick <atrick at apple.com> wrote:
>> On Jul 12, 2013, at 3:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>> 3.2 Compile partitions independently
>> There are two camps: one camp advocate compiling partitions via
>> the other one favor multi-thread.
>> Inside Apple compiler teams, I'm the only one belong to the 1st comp. I
>> while multi-proc sounds bit red-neck, it has its advantage for this purpose,
>> while multi-thread is certainly more eye-popping, it has its advantage
>> as well.
>> The advantage of multi-proc are:
>> 1) easier to implement, the process run in its own address space.
>> We don't need to worry about they can interfere with each other.
>> 2)huge, or not unlimited, address space.
>> The disadvantage is that it's expensive. But I guess the cost is
>> almost negligible compared to the overall IPO compilation.
>> The advantage of multi-threads I can imagine are:
>> 1) sound fancy
>> 2) it is light-weight
>> 3) inter-thread communication is easier than IPC.
>> Its disadvantage are:
>> 1). Oftentime we will come across race-condition, and it took
>> awful long time to figure it out. While the code is supposed
>> to be mult-thread safe, we might miss some tricky case.
>> Trouble-shooting race condition is a nightmare.
>> 2) Small address space. This is big problem if we the compiler
>> is built 32-bit . In that case, the compiler is not able to bring
>> lots of stuff in memory even if the HW dose
>> provide ample mem.
>> 3) The thread-safe run-time lib is more expensive.
>> I once linked a compiler using -lpthread (I dose not have to) on a
>> UNIX platform, and saw the compiler slow down by about 1/3.
>> I'm not able to convince the folks in other camp, neither are they
>> able to convince me. I decide to implement both. Fortunately, this
>> part is not difficult, it seems to be rather easy to crank out one within
>> period of time. It would be interesting to compare them side-by-side,
>> and see which camp lose:-). On the other hand, if we run into race-condition
>> problem, we choose multi-proc version as a fall-back.
>> While I am a self-proclaimed multi-process red-neck, in this case I would
>> prefer to see a multi-threaded implementation because I want to verify that
>> LLVMContext can be used as advertised. I'm sure some extra care will be
>> needed to report failures/diagnostics, but we should start with the
>> assumption that this approach is not significantly harder than multi-process
>> because that's how we advertise the design.
>> If any of the multi-threaded disadvantages you point out are real, I would
>> like to find out about it.
>> 1. Race Conditions: We should be able to verify that the thread-parallel vs.
>> sequential or multi-process compilation generate the same result. If they
>> diverge, we would like to know about the bug so it can be fixed--independent
>> of LTO.
>> 2. Small Address Space with LTO. We don't need to design around this
>> hypothetical case.
>> 3. Expensive thread-safe runtime lib. We should not speculate that platforms
>> that we, as the LLVM community, care about have this problem. Let's assume
>> that our platforms are well implemented unless we have data to the contrary.
>> (Personally, I would even love to use TLS in the compiler to vastly simplify
>> API design in the backend, but I am not going to be popular for saying so).
>> We should be able to decompose each step of compilation for debugging. So
>> the multi-process "implementation" should just be a degenerate form of
>> threading with a bit of driver magic if you want to automate it.
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
More information about the llvm-dev