[LLVMdev] [Proposal] Parallelize post-IPO stage.

Xinliang David Li xinliangli at gmail.com
Tue Jul 16 14:04:45 PDT 2013

On Tue, Jul 16, 2013 at 1:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
> I have actually came up the 3 approaches to build the post-ipo object
> independently.
> The "3rd approach" here is the 1st solution in my original proposal. Almost
> all coworkers call it sucks:-)
> Now I accept it because the it has no way to be adaptive.
> Consider the scenario we compile the llvm compiler. We use "make -j16" for
> computer with 8 processor, each make-thread invoke a compiler which may
> blindly invoke 16 threads!
> So, we end up to have 16*16 threads.

Determining the right parallelism is not the job of the compiler
(builtin) nor that of a developer -- the underlying build system
should take care of the scheduling :)


> Being adaptive will render it possible to pick up right factor judiciously
> and adpatively.
> In any case, I will support this approach (i.e. the 3rd approach you
> mentioned) at very least at beginning.
> On 7/16/13 1:35 PM, Xinliang David Li wrote:
>> A third approach is to decouple the backend compilation and
>> parallelism strategy from the partitioning.  The partitioning can
>> spits out partition BC files and some action records in some standard
>> format. All of this can be fed into some driver tools that converts
>> the compilation action file into make/build file of the underlying
>> build system of your choice:
>> 1) it can simply a compiler driver that does thread level parallelism;
>> 2) or a tool that generates Makfiles which are fed into parallel make
>> to explore single node parallelism;
>> 3) or a tool that generates BUILD files that feed into distributed
>> build system (such as Google's blaze:
>> http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html)
>> Another benefit is it will make compiler debugging easier.
>> thanks,
>> David
>> On Sun, Jul 14, 2013 at 5:56 PM, Andrew Trick <atrick at apple.com> wrote:
>>> On Jul 12, 2013, at 3:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>>> 3.2 Compile partitions independently
>>> --------------------------------------
>>>    There are two camps: one camp advocate compiling partitions via
>>> multi-process,
>>> the other one favor multi-thread.
>>>   Inside Apple compiler teams, I'm the only one belong to the 1st comp. I
>>> think
>>> while multi-proc sounds bit red-neck, it has its advantage for this
>>> purpose,
>>> and
>>> while multi-thread is certainly more eye-popping, it has its advantage
>>> as well.
>>>   The advantage of multi-proc are:
>>>   1) easier to implement, the process run in its own address space.
>>>     We don't need to worry about they can interfere with each other.
>>>   2)huge, or not unlimited, address space.
>>>    The disadvantage is that it's expensive. But I guess the cost is
>>>   almost negligible compared to the overall IPO compilation.
>>>   The advantage of multi-threads I can imagine are:
>>>    1) sound fancy
>>>    2) it is light-weight
>>>    3) inter-thread communication is easier than IPC.
>>>   Its disadvantage are:
>>>    1). Oftentime we will come across race-condition, and it took
>>>       awful long time to figure it out. While the code is supposed
>>>       to be mult-thread safe, we might miss some tricky case.
>>>       Trouble-shooting race condition is a nightmare.
>>>    2) Small address space. This is big problem if we the compiler
>>>       is built 32-bit . In that case, the compiler is not able to bring
>>>       lots of stuff in memory even if the HW dose
>>>       provide ample mem.
>>>    3) The thread-safe run-time lib is more expensive.
>>>       I once linked a compiler using -lpthread (I dose not have to) on a
>>>       UNIX platform,  and saw the compiler slow down by about 1/3.
>>>     I'm not able to convince the folks in other camp, neither are they
>>> able to convince me. I decide to implement both. Fortunately, this
>>> part is not difficult, it seems to be rather easy to crank out one within
>>> short
>>> period of time. It would be interesting to compare them side-by-side,
>>> and see which camp lose:-). On the other hand, if we run into
>>> race-condition
>>> problem, we choose multi-proc version as a fall-back.
>>> While I am a self-proclaimed multi-process red-neck, in this case I would
>>> prefer to see a multi-threaded implementation because I want to verify
>>> that
>>> LLVMContext can be used as advertised. I'm sure some extra care will be
>>> needed to report failures/diagnostics, but we should start with the
>>> assumption that this approach is not significantly harder than
>>> multi-process
>>> because that's how we advertise the design.
>>> If any of the multi-threaded disadvantages you point out are real, I
>>> would
>>> like to find out about it.
>>> 1. Race Conditions: We should be able to verify that the thread-parallel
>>> vs.
>>> sequential or multi-process compilation generate the same result. If they
>>> diverge, we would like to know about the bug so it can be
>>> fixed--independent
>>> of LTO.
>>> 2. Small Address Space with LTO. We don't need to design around this
>>> hypothetical case.
>>> 3. Expensive thread-safe runtime lib. We should not speculate that
>>> platforms
>>> that we, as the LLVM community, care about have this problem. Let's
>>> assume
>>> that our platforms are well implemented unless we have data to the
>>> contrary.
>>> (Personally, I would even love to use TLS in the compiler to vastly
>>> simplify
>>> API design in the backend, but I am not going to be popular for saying
>>> so).
>>> We should be able to decompose each step of compilation for debugging. So
>>> the multi-process "implementation" should just be a degenerate form of
>>> threading with a bit of driver magic if you want to automate it.
>>> -Andy
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

More information about the llvm-dev mailing list