[cfe-dev] Getting involved with Clang refactoring

Fri May 25 08:43:13 PDT 2012

On Fri, May 25, 2012 at 5:19 PM, James K. Lowden
<jklowden at schemamania.org>wrote:

> On Fri, 25 May 2012 07:42:46 +0200
> Manuel Klimek <klimek at google.com> wrote:
>
> > Changes to a symbol normally occur in a header
> > > file, affecting the dependency graph in just the way that makes make
> > > useful.  To solve the problem you describe -- to have Clang
> > > recompile "all files that use a specific symbol"  -- one could touch
> > > (1) all such files and then run make.
> >
> > Not all build systems work on system time. Some work on content
> > digests.
>
> I don't know what you mean by "content digests", unless you're talking
> about some kind of fingerprint (e.g. MD5) to determine if the content
> has changed.  That's clearly insufficient if a header file changes.
>

Yes, I mean a fingerprint like md5.
I don't understand how this is different to timestamps when it comes to
header files - the headers need to be in the dependency graph anyways.

> > Even if this worked, the amount of stats that build systems
> > do in ReallyLargeCodeBases to figure out what to build can be a huge
> > cost, and can actually dwarf the time it takes to reparse 3 or 4
> > files you care about.
>
> I doubt that.  Could you point me to research supporting that claim?
>

No. I can point you to my anecdotal experience with networked file systems
and code bases that are so large that you don't want to check out
everything at once.

> Experience in pkgsrc shows that recursive make is very slow over
> large trees, and that the problem is the process overhead, not solving
> the graph.
>
> If you haven't read it, I recommend Peter MIller's "Recursive Make
> Considered Harmful" (http://aegis.sourceforge.net/auug97.pdf). It's a
> trove of information about misinformation around make.
>
> > One more reason: dependency order introduces artificial
> > serialization.
>
> Are you saying the dependencies of static analysis are lesser than for
> compilation?  I would think that's ultimately untrue, given that
> eventually static analysis would have to be applied to the whole
> program, even if that's not the case today.
>

But that doesn't mean you can't apply it in parallel. We've had great
success using a MapReduce over ~100MLOC code to do very fast parallel
static analysis.

> If you mean make doesn't compile in parallel when it could, I would be
> interested to see an example, because afaik that's a solved problem
> theoretically.
>
> > It seems to me like most innovations look like this at first. I might
> > be wrong, but at least let me try ;)
>
> Nothing against you trying; I'm hoping at least one of us will learn
> something.  :-)
>
> You may have heard, "Scientists stand on one another's shoulders;
> computer scientists stand on one another's toes."  A lot of innovations
> aren't, and it's not hard to find projects (and products) re-solving
> things using new terminology, seemingly unaware how old and well trod
> is the ground they're covering.  It pays to be skeptical.
>

Well, obviously I think I am skeptical myself, but I'm aware that's
confirmation bias ;)

Cheers,
/Manuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120525/7bf24dee/attachment.html>