[cfe-dev] my gsoc proposal: llvm/clang distcc

Eli Friedman eli.friedman at gmail.com
Mon Apr 7 11:58:16 PDT 2008


On Mon, Apr 7, 2008 at 7:34 AM, Peter Neumark <peter.neumark at gmail.com> wrote:
> Hi,
> here is my proposal:

Have you submitted an application to Google? The deadline is today, so
I'd suggest doing it sooner rather than later.  (AFAIK, you can tweak
it later.)

> Implemenation details
>     The development will be done incrementally, from the simplest solution
> to more complex.
>      The simplest solution is when all source parsing task is done locally
> and the built AST
>     is distributed to Node for optimization and code generation and it sends
> the result back when its done.

So you're planning to use the regular clang codepath through Sema,
then use the AST serialization to send it across the network to
another host?  That's not a bad approach, since by that point all the
dependencies on installed headers are gone, and you don't have to
worry about errors (besides bugs in clang/LLVM).

>     An advanced solution is when a file sharing protocol is used to share
> local source files (for including)
>      and then parsing is done in Node side and file including is done via
> the file sharing protocol.

Right... that reduces the load on the host, but it might increase the
compile time due to the round-trip time for the requests. It also
significantly complicates the protocol.  It'll be interesting to see
which approach performs better.

>     A more advanced solution is when we caching built ASTs in a central
> database to prevent
>     parsing and building each time. This is useful in header files case.

I'm not sure it's practical to cache headers in that way.  The exact
way a header parses depends on the code before it, and solving
dependencies on other headers seems like more trouble than it's worth.
 If you can come up with something here, that would be cool, though.

Caching whole files would be possible, but not too important, since
someone could just run "ccache distcc".

>      So there will be these standalone programs:
>         + distcc, supports gcc options
>         + distcc, supports clang options
>         + distcc daemon for Nodes (network is composed from Nodes, what will
> do the compilation work)
>          + distcc admin daemon (stores information from the distcc Node
> network)

So the way that the client discovers distcc nodes is through the admin
daemon?  I'm not too familiar with distcc's architecture.

>     All necessary software components are available in llvm/clang sources,
> but network handling.
>     So there will be a thin network layer implemented for unix and windows
> platforms.
>      The new distcc driver will be placed in clang/Driver directory.
>
>     In caching the cached AST identification can be done with a MD3 sum of
> the source file including the included
>     files MD3 sum and the options used in parsing (defines).
>
> Development methodology
>     The work will be done via svn. I'll need a clang branch for my work. But
> it is not required a standalone svn
>     repository will work too.
>     I use ubuntu linux (gutsy gibbon). I'll send a weekly report of project.
>      I'll write user and developer documentation (html or pdf).
>
> Project Schedule
>     Before the mid time gsoc evaluation the simplest method will be
> implemented. The file sharing protocol and
>     caching will be done in second part of soc. But it can be figured out in
> depth during the first part, when
>      the simples solution will ready.
>
> Bio
>     I'm a 23 years old student, studying at the Budapest University of
> Technology and Economics. I've started programming 7 years
>     ago, and I've been using the C language for 6 years, and the C++
> language for 5 years. I've been using opensource software for 7 years.
>      Compiler programs are one of my passions. I like efficient and clean
> solutions. I like nice and clean and well documeted API's,
>     like Qt, Ogre3D, llvm, clang. I have stable knowledge of OOP and
> software engineering.
>      I like much reusable and clean, easy to understand code.
>     I'm familiar with the following programming languages:
>         - C (6 years)
>         - C++ (5 years)
>         - python (3 years)
>         - java (4 years)
>          - SML (1 year)
>         - Prolog (1 year)
>         - lua, squirrel
>         - haskell (actual passion)
>     I'm tracking llvm and clang development since last gsoc, beacuse I've
> recognised llvm in gsoc projects list.
>      I have an svn copy of llvm and clang since 2007 october. I always
> compile it. I've readen all docs avalable from llvm and clang.
>     I also know the source code structure and its functionality.
>
>
> _______________________________________________
>  cfe-dev mailing list
>  cfe-dev at cs.uiuc.edu
>  http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>



More information about the cfe-dev mailing list