[cfe-dev] Question about clang distcc

Matthieu Monrocq matthieu.monrocq at gmail.com
Sun Jun 17 02:32:44 PDT 2012

On Sat, Jun 16, 2012 at 9:14 PM, Manuel Klimek <klimek at google.com> wrote:

> On Sat, Jun 16, 2012 at 8:36 PM, Matthieu Monrocq <
> matthieu.monrocq at gmail.com> wrote:
>> On Sat, Jun 16, 2012 at 3:19 PM, Lyu Mitnick <mitnick.lyu at gmail.com>wrote:
>>> Hello Douglas,
>>> I have read all posted carefully. According to the discussion, what we
>>> can do better than
>>> original distcc are as follows:
>>> 1) The intermediate files passed over the network would be serialized AST
>>> 2) The intermediate files passed over the network would be LLVM IR
>>> 3) Centralized admin daemon
>>> 4) Use PCH
>>> To improve the issues above. We can extend the original distcc.
>>> However Chris Lattner
>>> mentioned the first mile-stone of clang distributed build project is
>>> re-implementing a new
>>> distcc.
>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-April/001357.html
>>> I am wondering to know why and what is the desired design of clang
>>> distributed build
>>> project for cfe community.
>>> Mitnick
>>> >
>>> > No, this project has not been accomplished. I don't think there was
>>> any real progress on it since that discussion.
>>> >
>>> >        - Doug
>> We have been using distcc (on top of gcc) for a while at work, and it
>> does work, somewhat, but it has a big issue: preprocessing is a huge part
>> of compilation, not being able to get rid of it creates a bottleneck that
>> inhibits true scalability. Given that we are using 24 cores servers, we
>> could push to about 40/60 parallel compilations (interesting when an error
>> occurs in a header and the local machine has to compile those 40/60 files
>> locally to display the errors); any further and we would not observe any
>> significant progress in compilation time: the local machine became the
>> bottleneck.
>> We experimented for a time with a solution where we streamed the raw
>> unprocessed files and had a filtering in place to only push "local" files
>> and have replica on the distcc hosts for the 3rd party headers. It worked
>> quite well, not much gain but slightly faster... as long as the distcc host
>> had the up-to-date collection of 3rd party headers & the local directory
>> hierarchy was similar; had a few issues with it (maintenance) so we fell
>> back to the traditional distcc.
>> Honestly, we got much more performance boost from ccache than from distcc.
>> I am unsure how to work around the local preprocessing issue, and I am
>> afraid that no significant progress will be made as long as it stands in
>> the way. I would be glad to hear some folks have ideas to get around it,
>> these days I put more hope in a "persistent" process that would cache
>> various stages of the compilation pipeline (maybe using the daemon Chandler
>> was talking about ?).
> clangd is not about speeding up distributed compilations - I'm not sure I
> understand what the problems were you ran into with your distributed build
> that pushed the "raw" files, but with enough caching you can save
> considerable time and processing power that way [1]. Hopefully modules will
> pave the way for an even better distributed C++ build. Well, and better
> linkers...
> [1]
> http://google-engtools.blogspot.de/2011/09/build-in-cloud-distributing-build-steps.html
My point was that maybe distributed compilation is not what we should be
aiming for. Modules-based languages such as Java don't have such issues
because they don't spend their time lexing/preprocessing/parsing the same
headers over and over; they are amenable to saving up the evaluated AST of
a module. C++ developers have often dreamt, even without modules, that
perhaps a sufficiently smart compiler process could manage this for header
files... the myth of the C++ compilation server.

Anyway, I think that clangd could be part of the solution. When using Java
with Eclipse, the files are compiled in the background, so you have to wait
less. We could imagine the same possibility with clangd: an option so that
when the file passes -fsyntax-only, the daemon generate the associated .o.

It's not the same idea that distributed compilation, but distributed
compilation is more about rebuilding from scratch I think, and people kinda
expect that a build from scratch be long. It's the incremental
re-compilation that is a pain (when you are working) and I believe clangd
would be amenable for speeding this up... though it's maybe a little early.

-- Matthieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120617/13e615ba/attachment.html>

More information about the cfe-dev mailing list