[LLVMdev] [Proposal] Parallelize post-IPO stage.
Xinliang David Li
xinliangli at gmail.com
Tue Jul 16 14:04:45 PDT 2013
On Tue, Jul 16, 2013 at 1:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
> I have actually came up the 3 approaches to build the post-ipo object
> independently.
>
> The "3rd approach" here is the 1st solution in my original proposal. Almost
> all coworkers call it sucks:-)
> Now I accept it because the it has no way to be adaptive.
>
> Consider the scenario we compile the llvm compiler. We use "make -j16" for
> computer with 8 processor, each make-thread invoke a compiler which may
> blindly invoke 16 threads!
> So, we end up to have 16*16 threads.
>
Determining the right parallelism is not the job of the compiler
(builtin) nor that of a developer -- the underlying build system
should take care of the scheduling :)
David
> Being adaptive will render it possible to pick up right factor judiciously
> and adpatively.
>
> In any case, I will support this approach (i.e. the 3rd approach you
> mentioned) at very least at beginning.
>
>
>
> On 7/16/13 1:35 PM, Xinliang David Li wrote:
>>
>> A third approach is to decouple the backend compilation and
>> parallelism strategy from the partitioning. The partitioning can
>> spits out partition BC files and some action records in some standard
>> format. All of this can be fed into some driver tools that converts
>> the compilation action file into make/build file of the underlying
>> build system of your choice:
>>
>> 1) it can simply a compiler driver that does thread level parallelism;
>> 2) or a tool that generates Makfiles which are fed into parallel make
>> to explore single node parallelism;
>> 3) or a tool that generates BUILD files that feed into distributed
>> build system (such as Google's blaze:
>>
>> http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html)
>>
>> Another benefit is it will make compiler debugging easier.
>>
>> thanks,
>>
>> David
>>
>> On Sun, Jul 14, 2013 at 5:56 PM, Andrew Trick <atrick at apple.com> wrote:
>>>
>>> On Jul 12, 2013, at 3:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>>>
>>> 3.2 Compile partitions independently
>>> --------------------------------------
>>>
>>> There are two camps: one camp advocate compiling partitions via
>>> multi-process,
>>> the other one favor multi-thread.
>>>
>>> Inside Apple compiler teams, I'm the only one belong to the 1st comp. I
>>> think
>>> while multi-proc sounds bit red-neck, it has its advantage for this
>>> purpose,
>>> and
>>> while multi-thread is certainly more eye-popping, it has its advantage
>>> as well.
>>>
>>> The advantage of multi-proc are:
>>> 1) easier to implement, the process run in its own address space.
>>> We don't need to worry about they can interfere with each other.
>>>
>>> 2)huge, or not unlimited, address space.
>>>
>>> The disadvantage is that it's expensive. But I guess the cost is
>>> almost negligible compared to the overall IPO compilation.
>>>
>>> The advantage of multi-threads I can imagine are:
>>> 1) sound fancy
>>> 2) it is light-weight
>>> 3) inter-thread communication is easier than IPC.
>>>
>>> Its disadvantage are:
>>> 1). Oftentime we will come across race-condition, and it took
>>> awful long time to figure it out. While the code is supposed
>>> to be mult-thread safe, we might miss some tricky case.
>>> Trouble-shooting race condition is a nightmare.
>>>
>>> 2) Small address space. This is big problem if we the compiler
>>> is built 32-bit . In that case, the compiler is not able to bring
>>> lots of stuff in memory even if the HW dose
>>> provide ample mem.
>>>
>>> 3) The thread-safe run-time lib is more expensive.
>>> I once linked a compiler using -lpthread (I dose not have to) on a
>>> UNIX platform, and saw the compiler slow down by about 1/3.
>>>
>>> I'm not able to convince the folks in other camp, neither are they
>>> able to convince me. I decide to implement both. Fortunately, this
>>> part is not difficult, it seems to be rather easy to crank out one within
>>> short
>>> period of time. It would be interesting to compare them side-by-side,
>>> and see which camp lose:-). On the other hand, if we run into
>>> race-condition
>>> problem, we choose multi-proc version as a fall-back.
>>>
>>>
>>> While I am a self-proclaimed multi-process red-neck, in this case I would
>>> prefer to see a multi-threaded implementation because I want to verify
>>> that
>>> LLVMContext can be used as advertised. I'm sure some extra care will be
>>> needed to report failures/diagnostics, but we should start with the
>>> assumption that this approach is not significantly harder than
>>> multi-process
>>> because that's how we advertise the design.
>>>
>>> If any of the multi-threaded disadvantages you point out are real, I
>>> would
>>> like to find out about it.
>>>
>>> 1. Race Conditions: We should be able to verify that the thread-parallel
>>> vs.
>>> sequential or multi-process compilation generate the same result. If they
>>> diverge, we would like to know about the bug so it can be
>>> fixed--independent
>>> of LTO.
>>>
>>> 2. Small Address Space with LTO. We don't need to design around this
>>> hypothetical case.
>>>
>>> 3. Expensive thread-safe runtime lib. We should not speculate that
>>> platforms
>>> that we, as the LLVM community, care about have this problem. Let's
>>> assume
>>> that our platforms are well implemented unless we have data to the
>>> contrary.
>>> (Personally, I would even love to use TLS in the compiler to vastly
>>> simplify
>>> API design in the backend, but I am not going to be popular for saying
>>> so).
>>>
>>> We should be able to decompose each step of compilation for debugging. So
>>> the multi-process "implementation" should just be a degenerate form of
>>> threading with a bit of driver magic if you want to automate it.
>>>
>>> -Andy
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>
More information about the llvm-dev
mailing list