[cfe-dev] my gsoc proposal: llvm/clang distcc
Devang Patel
dpatel at apple.com
Mon Apr 7 12:53:41 PDT 2008
Peter,
On Apr 7, 2008, at 7:34 AM, Peter Neumark wrote:
> The development will be done incrementally, from the simplest
> solution to more complex.
> The simplest solution is when all source parsing task is done
> locally and the built AST
> is distributed to Node for optimization and code generation and
> it sends the result back when its done.
This is indeed a good first step.
However, couple of points to note while designing new distributed
build system from scratch.
In general, preprocessing source file consumes significant portion of
total compile time. If local host is tasked to preprocess all source
files then it becomes bottleneck.
GCC uses PCH mechanism to reduce compile time locally. If distributed
build system distributes the GCC PCHes then it is likely to flood the
network (because GCC PCH size is significantly larger then source
files) which may have an impact on scalability.
The ideal solution 1) does not impose significant compilation related
duties on the local host, 2) does not incur huge network traffic
during job distribution and 3) let local host focus on efficient
distribution of tasks and collection of results.
> An advanced solution is when a file sharing protocol is used to
> share local source files (for including)
> and then parsing is done in Node side and file including is done
> via the file sharing protocol.
IMO, such setup would work well in an environment where available of
Nodes is stable.
One variation of this advanced solution would be to just distribute
build instructions (command line flags etc...), source file names and
project source repository revision number (e.g. svn rev. number) to
the Nodes and let the Nodes get project source files from the
repository directly. This would free local host from duty of
distributing files and take advantage of existing bandwidth provided
by source code repository server.
Yet another advanced step would be to build a pyramid scheme to
distribute incremental link time optimization work.
> A more advanced solution is when we caching built ASTs in a
> central database to prevent
> parsing and building each time. This is useful in header files
> case.
The trick here is to validate already cached centralized ASTs very
cheaply. This is not cheap in current GCC implementation. Steve Naroff
did such implementation in a standalone preprocessor to boost local
compilation time in early 1990s.
I won't be surprised if distributed AST catches do well compared to
centralized AST catch.
> So there will be these standalone programs:
> + distcc, supports gcc options
> + distcc, supports clang options
> + distcc daemon for Nodes (network is composed from Nodes,
> what will do the compilation work)
> + distcc admin daemon (stores information from the distcc
> Node network)
-
Devang
[ I do not imply that all suggested ideas should be covered in this
GSoC proposals. ]
More information about the cfe-dev
mailing list