[cfe-dev] my gsoc proposal: llvm/clang distcc

Devang Patel dpatel at apple.com
Mon Apr 7 12:53:41 PDT 2008


Peter,

On Apr 7, 2008, at 7:34 AM, Peter Neumark wrote:

>     The development will be done incrementally, from the simplest  
> solution to more complex.
>     The simplest solution is when all source parsing task is done  
> locally and the built AST
>     is distributed to Node for optimization and code generation and  
> it sends the result back when its done.

This is  indeed a good first step.

However, couple of points to note while designing new distributed  
build system from scratch.

In general, preprocessing source file consumes significant portion of  
total compile time.  If local host is tasked to preprocess all source  
files then it becomes bottleneck.

GCC uses PCH mechanism to reduce compile time locally. If distributed  
build system distributes the GCC PCHes then it is likely to flood the  
network (because GCC PCH size is significantly larger then source  
files) which may have an impact on scalability.

The ideal solution 1) does not impose significant compilation related  
duties on the local host, 2) does not incur huge network traffic  
during job distribution and 3) let local host focus on efficient  
distribution of tasks and collection of results.

>     An advanced solution is when a file sharing protocol is used to  
> share local source files (for including)
>     and then parsing is done in Node side and file including is done  
> via the file sharing protocol.

IMO, such setup would work well in an environment where available of  
Nodes is stable.

One variation of this advanced solution would be to just distribute  
build instructions (command line flags etc...), source file names and  
project source repository revision number (e.g. svn rev. number) to  
the Nodes and let the Nodes get project source files from the  
repository directly. This would free local host from duty of  
distributing files and take advantage of existing bandwidth provided  
by source code repository server.

Yet another advanced step would be to build a pyramid scheme to  
distribute incremental link time optimization work.

>     A more advanced solution is when we caching built ASTs in a  
> central database to prevent
>     parsing and building each time. This is useful in header files  
> case.

The trick here is to validate already cached centralized ASTs very  
cheaply. This is not cheap in current GCC implementation. Steve Naroff  
did such implementation in a standalone preprocessor to boost local  
compilation time in early 1990s.

I won't be surprised if distributed AST catches do well compared to  
centralized AST catch.


>     So there will be these standalone programs:
>         + distcc, supports gcc options
>         + distcc, supports clang options
>         + distcc daemon for Nodes (network is composed from Nodes,  
> what will do the compilation work)
>         + distcc admin daemon (stores information from the distcc  
> Node network)

-
Devang

[ I do not imply that all suggested ideas should be covered in this  
GSoC proposals. ]



More information about the cfe-dev mailing list