[cfe-dev] Getting involved with Clang refactoring

Tue May 29 09:50:11 PDT 2012

On Tue, May 29, 2012 at 6:40 PM, James K. Lowden
<jklowden at schemamania.org>wrote:

> On Fri, 25 May 2012 17:43:13 +0200
> Manuel Klimek <klimek at google.com> wrote:
>
> > On Fri, May 25, 2012 at 5:19 PM, James K. Lowden
> > <jklowden at schemamania.org>wrote:
> >
> > > On Fri, 25 May 2012 07:42:46 +0200
> > > Manuel Klimek <klimek at google.com> wrote:
> > >
> > > > > To solve the problem you describe -- to have Clang
> > > > > recompile "all files that use a specific symbol"  -- one could
> > > > > touch(1) all such files and then run make.
> > > >
> > > > Not all build systems work on system time. Some work on content
> > > > digests.
> > >
> > > That's clearly insufficient if a header file changes.
> >
> > I don't understand how this is different to timestamps when it comes
> > to header files - the headers need to be in the dependency graph
> > anyways.
>
> I see.  There are two orthogonal issues: the dependency graph, and
> determining which files in that graph have changed.  No matter how
> "change" is measured/defined, anything affected by it -- anything
> further down the graph -- needs recompilation or relinking.
>
> With make, we express the dependency graph directly in make's
> target:source syntax, and "change" is defined in terms of the files'
> mtimes.  make is quite good at converting its source language into a
> DAG, but to determine what's changed it must rely on the underlying
> system.
>
> > > > Even if this worked, the amount of stats that build systems
> > > > do in ReallyLargeCodeBases to figure out what to build can be a
> > > > huge cost, and can actually dwarf the time it takes to reparse 3
> > > > or 4 files you care about.
> > >
> > > I doubt that.  Could you point me to research supporting that claim?
> >
> > No. I can point you to my anecdotal experience with networked file
> > systems and code bases that are so large that you don't want to check
> > out everything at once.
>
> I think I see where you're going.
>
> If the source control system (or, say, compilation database) maintains
> fingerprints, it's possible to determine what's changed a priori,
> without checking out the files to a local working directory.  If it
> furthermore represents the dependency graph, it's also possible to
> check out only "what's changed" prior to rebuilding, and to federate
> that rebuilding. In effect, the fingerprints represent change directly,
> whereas inode mtime is only a proxy, and not one that scales well along
> a network.
>
> Each time make runs, it must stat the whole tree, meaning that its
> evaluation time is O(n) with the number of files.  A database must also
> make as many comparisons, but those comparisons orders of magnitude
> faster than a system call.  If N is 100 or even 1000, none of that
> matters very much, but if N is 1,000,000 it will be measurable.
>
> It's interesting to think of make divorced from the filesystem, to
> imagine a make implementation that used a database of fingerprints to
> answer the "is changed" question.  I'm guessing that is, in a way, what
> you've done or are doing, except that it isn't as general as make.
>
> > > > One more reason: dependency order introduces artificial
> > > > serialization.
> > >
> > > Are you saying the dependencies of static analysis are lesser than
> > > for compilation?  I would think that's ultimately untrue, given that
> > > eventually static analysis would have to be applied to the whole
> > > program, even if that's not the case today.
> >
> > But that doesn't mean you can't apply it in parallel. We've had great
> > success using a MapReduce over ~100MLOC code to do very fast parallel
> > static analysis.
>
> Yes, but part of that "win" is due to the fact that you're not
> linking.  Static analysis today is comparable to compilation, the
> evaluation of a single translation unit.
>

Yes. But if you want to statically analyze a whole code base for global
information, you need to analyze every TU (as you would need to compile
every TU). Having a dependency graph restriction in here would lead to
sequential bottlenecks (which make -j shows). On the other hand, having
global analysis of course still relies on the build, as generated sources
need to be available. But given that we often want to run multiple
iterations of static analysis at a single point of the code, this pays of
big time.

> > > Nothing against you trying; I'm hoping at least one of us will learn
> > > something.  :-)
>
> If I'm on target above, then our experiment yielded unexpected but
> useful results!

More details about what I'm talking about can be found here:
http://google-engtools.blogspot.de/
(use the left side navigation bar to everything that has "Build" in the
title)

Cheers,
/Manuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120529/839ab75c/attachment.html>