[cfe-dev] distcc implementation
Holger Schurig
holgerschurig at gmail.com
Wed Feb 17 00:02:04 PST 2010
> 2. What stages of the compilation are worth parallelizing (at
> least for a first step)?
There are benchmarks out that show you how much time the compiler
spend in which part (preprocessing, parsing, code gen).
You should also spend some time to understand how distcc (or
ccache) for gcc works. AFAIK it goes this way: the source code
get's run on the source machine throught the preprocessor. This
preprocessor reads all the *.h files on the source machine and
generates one huge file. The benefit: the other machines that
help compiling the code won't need the same headers installed.
They just get one file to process. Now they parse it, compile
and, and produce some *.o file.
Please note that distcc 3 adds a new mode, where you don't
preprocess the sources. This make the distribution process
faster, but you need identical system headers on all boxes. But
it's an optional mode. See http://distcc.org for more info.
And that get's transferred back to the source machine, which can
the do the linking once all *.o files arrived.
ccache works similar, it just makes a hash over the preprocessed
code and stores the resulting *.o into a database with this hash
as key. Or, if a *.o with the same hash exists, it hands that
*.o file quickly back, short-cutting the
parsing/code-generation.
> 4. Are there any examples of code(preferably in real-world
> projects) which would lend themselves to parallel compilation
> which come to mind?
Almost all "big" source code bases. If you have a small code base
with only 4 *.c files, it's hardly worth going via distcc. But
if you have 1000 *.c files, it makes a difference :-) E.g.
compiling LLVM with distcc can greatly speed up the compilation,
but the same is true for Qt, some KDE-Programs, Mozilla,
OpenOffice etc.
I use ccache and distcc also when cross-compiling, with the
OpenEmbedded.org build environment.
> 5. Where should I start? :). Obviously this is a pretty large
> undertaking, but is there any documentation that I should look
> at? Any particular source files that would be relevant?
I'd re-used most of distcc's work, e.g. learn about their
protocol.
Then I would start with the preprocessed (the non-pump) method.
Learn where you can intercept the pre-processed stream. That
should be easy enought, because there's a compiler switch that
does this.
Now you need to intercept this preprocessed stuff, transport it
to the remote site, and compile it there. For this you'll need
to write an llvm-distcc-daemon. You also need to transport the
*.o back. As the real distcc found a solutions for this, you
don't need to re-invent the wheel.
The "driver" on the local box could simply block while the remote
compiles stuff, so someone can run "make -j10" when he has 10
remote boxes (or 5 remote boxes with dual-cores).
Hey, but the fun of such a project is to make a plan by yourself.
Otherwise it's dumb coding of other people's ideas :-)
--
http://www.holgerschurig.de
More information about the cfe-dev
mailing list