[cfe-dev] my gsoc proposal: llvm/clang distcc

Chris Lattner clattner at apple.com
Mon Apr 7 15:56:02 PDT 2008


On Apr 7, 2008, at 7:34 AM, Peter Neumark wrote:
> Hi,
> here is my proposal:

Hi Peter,

Sorry for the delay, I know time is running short.

> Synopsys
>     The main purpose of this project is implement a clang/llvm based  
> distcc implementation.
>     Distcc means distributed compiler. It can be used as a  
> replacement of gcc.

It would be useful to say that 'distcc' here means a general  
distributed compiler tool, not an extension to the existing distcc tool.

>     Clang distcc will support ditributed (over network) compilation  
> to any architecture
>     supported by llvm. The llvm/clang distcc main advantage over  
> original gcc based distcc is
>     performace (less memory, less compile time) and customization.
>     All these benefit comes from llvm and clang.
>
>     The new discc will be a compiler driver (or frontend) build from  
> clang and llvm libraries.
>     The driver will have two usage mode: gcc option mode and clang  
> option mode.
>     So it can be used as a drop in replacement of gcc, and it will  
> handle all distribution and cache task.
>     There will be also an admin daemon, what will support  
> configuration and distributes incoming
>     requests to nodes. In each node will run a distcc deamon, and  
> will handle incoming tasks.
>     This will do the compilation work.
>     The clang distcc will support languages via clang. So currently  
> C and Objective C will be supported.

One nice thing about the existing distcc is that it can work with  
existing other random compilers: it can work with GCC as well as (say)  
ICC or llvm-gcc.  It would be nice to have the option to support  
these, by using preprocessed .i files as the common medium.

> Functionality details
>     Usage:
>         + setup network:
>             - start the admin deamon, in a node
>             - start distcc deamons in each node
>             - register nodes in admin deamon
>
>         + setup local:
>             - configure distcc, (setup admin node address)
>                 This will generate a config file.
>
>         + use: ex: make CC=distcc

Ok

>
> Implemenation details
>     The development will be done incrementally, from the simplest  
> solution to more complex.

yay! :)

>     The simplest solution is when all source parsing task is done  
> locally and the built AST
>     is distributed to Node for optimization and code generation and  
> it sends the result back when its done.
>     An advanced solution is when a file sharing protocol is used to  
> share local source files (for including)
>     and then parsing is done in Node side and file including is done  
> via the file sharing protocol.
>     A more advanced solution is when we caching built ASTs in a  
> central database to prevent
>     parsing and building each time. This is useful in header files  
> case.
>     So there will be these standalone programs:
>         + distcc, supports gcc options
>         + distcc, supports clang options
>         + distcc daemon for Nodes (network is composed from Nodes,  
> what will do the compilation work)
>         + distcc admin daemon (stores information from the distcc  
> Node network)

I think that this is too much to be realistically accomplished in a  
summer.  I think it would be reasonable to incrementally develop this  
with the following milestones:

1. The first major useful milestone is a "new distcc".  Implement  
exactly what distcc does, but better.  Building this involves the main  
driver, and the 'node daemon'.  The intermediate files passed over the  
network would be .i files.
2. Once #1 basically working, add an 'admin daemon' that is a  
centralized process on the machine running 'make' which handles  
communication with the remote nodes.  This allows intelligent load  
balancing, and allows preprocessor caching as well.
3. Once #2 is working well, there are a variety of things in clang  
that could be done to make the preprocessing faster and more  
efficient.  Everything from using PCH effectively, to dynamically  
detecting PCH, to other intelligent caching of token strings, to  
implementing -fdirectives-only [ala gcc] can be considered.

I think that the first two and part of #3 is a full summer worth of  
work.  Maybe next summer (when clang is farther along) we can talk  
about using serialized ast's for the distribution medium, and/or use a  
network file system to distribute files etc.  It isn't clear whether  
these are a significant win though.

>     All necessary software components are available in llvm/clang  
> sources, but network handling.
>     So there will be a thin network layer implemented for unix and  
> windows platforms.
>     The new distcc driver will be placed in clang/Driver directory.

Sounds good.

>     In caching the cached AST identification can be done with a MD3  
> sum of the source file including the included
>     files MD3 sum and the options used in parsing (defines).

Ok, this provides something like 'ccache'?

>
> Development methodology
>     The work will be done via svn. I'll need a clang branch for my  
> work. But it is not required a standalone svn
>     repository will work too.
>     I use ubuntu linux (gutsy gibbon). I'll send a weekly report of  
> project.
>     I'll write user and developer documentation (html or pdf).

Ok

This is very exciting: there is a huge community of people who could  
benefit from a better 'distcc' tool.  I'm looking forward to seeing  
this make progress!

-Chris



More information about the cfe-dev mailing list