[cfe-dev] final distributed clang patch

Tue Jul 8 16:27:10 PDT 2008

Hi Peter,

I agree with the comments that Eli and Chris made; the code  
duplication is something we want to avoid.  Eli brought up an  
excellent point that key pieces of the driver should be factored off  
to a separate library, and I too have felt this way for some time. I  
think that even resolving all the various preprocessor and compiler  
options (e.g., -I, -D, etc.) that is needed to instantiate a  
Preprocessor should also be factored out of clang.cpp into a separate  
library.

I also agree with Chris's comments that separating the "distcc" driver  
from a regular clang driver is a good idea.  That keeps the distcc  
implementation simpler, and potentially allows it to be used with  
multiple compilers (not just clang).  I myself was fine with  
integrating the distcc support directly into the clang driver for a  
first pass, but because the distcc driver will not use all of the same  
functionality as the regular clang driver (and obviously do a few  
things that the regular clang driver does not), the better long term  
approach is to factor key components of the clang driver into  
libraries, make clang and distcc-clang separate executables, and  
simplify the logic for both.

One thing that hasn't emerged in this discussion is whether or not the  
clang distcc should interoperate with the traditional distcc  
implementation, or (a different but related issue) is whether we  
should require that the compiler itself be clang.  One advantage of a  
clang-based distcc, independent of using clang to perform compilation,  
is that clang-distcc can do the source preprocessing itself without  
forking off a separate process (which is what the traditional distcc  
implementation does).  This seems more like a good step one: build a  
distcc client that just takes care of preprocessing in-process, and  
see what kind of speedups you get over forking and preprocessing.   
Ultimately we're interested in speed and scalability, and small steps  
like these help guide the design.

Interoperability with other compilers doesn't mean we should limit the  
design of clang-distcc.  We can certainly implement special  
functionality when multiple compiler "workers" are based on clang  
(e.g., serializing ASTs, special caching, etc.).

I like the concept of the NetSession class, although the issue of  
interoperability with existing distcc implementations is something  
that is worth discussing.  Chris is right that the system-specific  
APIs, such as the use of sockets, should not be in header files.  A  
PIMPL approach, like what we use for FileManager, would probably work  
well (where the system-specific stuff only appears in the .cpp file).

As for the clang server, both pthreads and sockets are system-specific  
APIs.  We'll want a design that keeps the threading modeling separate  
from the code that processes a unit of work.  This will allow us to  
tailor the implementation to use the best parallel computing  
primitives that are available on a specific architecture.

I'm also a little confused with the overall design.  It looks like a  
client (a 'clang' process) connects to a server, sends the  
preprocessed source to the server, waits for the server to chew on the  
file, gets the processed output from the server, and then writes the  
output to disk.  It appears that the client attempts to connect to  
different servers in a serial fashion, and then picks the first  
available server.  Is this how traditional distcc works?   (I actually  
don't know)  It's a simple design, but it doesn't amend itself well to  
good load balancing as well as reducing the latencies in firing off  
compilation jobs (a bunch of connection attempts in serial fashion  
seems potentially disastrous for performance).  This particular point  
isn't a criticism of your patch; what's there is fine to get things  
started.  I'm not a distributed computing expert, but something akin  
to the Google MapReduce system (which has workers and controllers)  
seems more flexible for fault tolerance, load balancing, and so  
forth.  This is certainly something worth discussing in a higher-level  
discussion of the overall design of the system.

A few comments inline.

On Jul 7, 2008, at 9:13 AM, Peter Neumark wrote:

> Here is the final patch for clang to support network distributed  
> compilation. (clang.patch file)
> There is also the server part attached. (the tar.gz file)

Like the client, the server shouldn't have so much code copied from  
the Driver, and it certainly doesn't need to use all of the  
ASTConsumers in the regular Clang driver.  General work (by anyone who  
is interested) on modularizing the driver will help make this much  
easier.

> There 3 new files added to Driver directory:  
> PrintPreprocessedOutputBuffer.cpp what is a modification of  
> PrintPreprocessedOutput.cpp to support print text to a std::ostream.

I'm not certain why a separate version of PrintPreprocessedOutput was  
necessary.  iostreams are slow, and writing to sockets using the FILE*  
abstraction is perfectly acceptable (via fdopen()).

> Other new files: NetSession.h and NetSession.cpp which handles and  
> contains all networking code (portable thin networking code).
> There are some files changed, mostly to support saving its output to  
> a std::stream. I've used that way to pass clang ASTConsumers data to  
> an other computer via network.
>
> There are 3 new option added to clang. The basic one is -distribute  
> what enables distributed compilation. The other two are: -dist- 
> preprocesslocally and -dist-serializelocally.
> If the first one enabled then clang sends a preprocessed file for  
> clangserver (a process in an other machine) to compile. In the  
> second case the lexing and parsing is done locally and the built and  
> serialized AST is sent to clangserver.
>
> You can play with this using -dist-preprocesslocally because it is  
> working.

Overall, I think this a good initial start!  I think that next logical  
steps would be to look at both overall design as well as issues of  
code structure (addressing the comments on modularity, isolating  
various implementation details, etc.).  Getting a few interesting  
performance timings would also be extremely useful to help shape some  
of those design decisions.

Incidentally, how well does the code work when the two processes  
(client and server) are on actually two different machines?  Right  
now, the client always connects to "localhost".  Getting performance  
timings when both client and server are on the same and different  
machines is also interesting to see how much things like network  
latency, etc., are a factor in the design.  There may also be some  
correctness issues that are masked by having the client on server on  
the same machine.

Ted