[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation

Tue Jul 16 11:02:52 PDT 2013

In addition to the concerns Chandler figure out,
I'm curious about :
     execution time of pristine-llc vs "modified-llc with -thd=1", and
     the exec-time of pristine-clang vs clang-linked-with-the-modified-llc.

Thanks


On 7/16/13 3:46 AM, Chandler Carruth wrote:
> While I think the end goal you're describing is close to the correct 
> one, I see the high-level strategy for getting there somewhat 
> differently:
>
> 1) The code generators are only one collection of function passes that 
> might be parallelized. Many others might also be parallelized 
> profitably. The design for parallelism within LLVM's pass management 
> infrastructure should be sufficiently generic to express all of these 
> use cases.
>
> 2) The idea of having multiple pass managers necessitates (unless I 
> misunderstand) duplicating a fair amount of state. For example, the 
> caches in immutable analysis passes would no longer be shared, etc. I 
> think that is really unfortunate, and would prefer instead to use 
> parallelizing pass managers that are in fact responsible for the 
> scheduling of passes.
>
> 3) It doesn't provide a strategy for parallelizing the leaves of a 
> CGSCC pass manager which is where a significant portion of the 
> potential parallelism is available within the middle end.
>
> 4) It doesn't deal with the (numerous) parts of LLVM that are not 
> actually thread safe today. They may happen to work with the code 
> generators you're happening to test, but there is no guarantee. 
> Notable things to think about here are computing new types, the 
> use-def lists of globals, commandline flags, and static state 
> variables. While our intent has been to avoid problems with the last 
> two that could preclude parallelism, it seems unlikely that we have 
> succeeded without thorough testing to this point. Instead, I fear we 
> have leaned heavily on the crutch of one-thread-per-LLVMContext.
>
> 5) It adds more complexity onto the poorly designed pass manager 
> infrastructure. Personally, I think that cleanups to the design and 
> architecture of the pass manager should be prioritized above adding 
> new functionality like parallelism. However, so far no one has really 
> had time to do this (including myself). While I would like to have 
> time in the future to do this, as with everything else in OSS, it 
> won't be real until the patches start flowing.
>
>
> On Tue, Jul 16, 2013 at 3:33 AM, Wan, Xiaofei <xiaofei.wan at intel.com 
> <mailto:xiaofei.wan at intel.com>> wrote:
>
>     Hi, community:
>
>     For the sake of our business need, I want to enable
>     "Function-based parallel code generation" to boost up the
>     compilation of single module, please see the details of the design
>     and provide your feedbacks on below aspects, thanks!
>     1. Is this idea the proper solution for my requirement
>     2. This new feature will be enabled by llc -thd=N and has no
>     impact on original llc when -thd=1
>     3. Can this new feature of llc be accepted by community and merged
>     into LLVM code tree
>
>     Patches
>     The patch is divided into four separated parts, the all-in-one
>     patch could be found here:
>     http://llvm-reviews.chandlerc.com/D1152
>
>     Design
>     https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing
>
>
>     Background
>     1. Our business need to compile C/C++ source files into LLVM IR
>     and link them into a big BC file; the big BC file is then compiled
>     into binary code on different arch/target devices.
>     2. Backend code generation is a time-consuming activity happened
>     on target device which makes it an important user experience.
>     3. Make -j or file based parallelism can't help here since there
>     is only one big BC file; function-based parallel LLVM backend code
>     generation is a good solution to improve compilation time which
>     will fully utilize multi-cores.
>
>     Overall design strategy and goal
>     1. Generate totally same binary as what single thread output
>     2. No impacts on single thread performance & conformance
>     3. Little impacts on LLVM code infrastructure
>
>     Current status and test result
>     1. Parallel llc can generate same code as single thread by
>     "objdump -d", it could pass 10 hours stress test for all
>     performance benchmark
>     2. Parallel llc can introduce ~2.9X performance gain on XEON sever
>     for 4 threads
>
>
>     Thanks
>     Wan Xiaofei
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/6ca93bc3/attachment.html>