[cfe-dev] zapcc compiler
chisophugis at gmail.com
Wed May 27 15:57:05 PDT 2015
On Wed, May 27, 2015 at 1:57 PM, James Widman <james.widman at gmail.com>
> On Wed, May 27, 2015 at 4:11 AM, James Widman <james.widman at gmail.com>
> > On Tue, May 26, 2015 at 1:38 PM, David Blaikie <dblaikie at gmail.com>
> >> On Mon, May 25, 2015 at 12:37 PM, Yaron Keren <yaron.keren at gmail.com>
> >>> zapcc maintains as much as possible from previous compilations: AST,
> >>> MC and DebugInfo. I'm not sure that module support goes that far.
> >> ASTs are preserved in modules, that's all they're for (parsing time
> tends to
> >> dominate, at least in our world/experiments/data as I understand it, so
> >> that's the first thing to fix). Duplicate IR/MC/DebugInfo is still
> >> though it'd be the next thing to solve - we're talking about
> >> some of the debug info and Adrian Prantl is working on that at the
> moment -
> >> putting debug info for types into the module files themselves and
> >> referencing it directly as a split DWARF file.
> >> Duplicate IR/MC comes from comdat/linkonce_odr functions - and at some
> >> it'd be nice to put those in a module too, if there's a clear single
> >> ownership (oh, you have an inline function in your modular header - OK,
> >> we'll IRGen it, make an available_externally copy of it in the module
> to be
> >> linked into any users of the module, and a standard external definition
> >> be codegen'd down to object code and put in the module to be passed to
> >> linker). This wouldn't solve the problems with templates that have no
> >> to put their definition.
> > I guess it depends on the build setup: if you spread the build across
> > multiple machines then... never mind.
> > But if the whole build is on one machine and it has enough memory, and
> > as long as something like zapcc is retaining the whole program's AST
> > anyway, it could be a win for it to complete that whole-program AST
> > before any IR is generated. Presumably, the compiler could then
> > invent the 'home' and do each instantiation exactly once in the entire
> > build.
> > Or... it might still help the multi-machine setup. In the worst case,
> > an instantiated function would get instantiated once per machine.
> > But in that case it might be nice to get a fix-it hint from the linker
> > to automatically extern-templateize all such instantiations. (:
> That reminds me: is there any public data that shows the percentage of
> build time spent doing IRGen/opt/CodeGen for duplicates that end up
> getting discarded?
I have information on a couple large (1-10MLOC) codebases indicating that
time spent outside of parsing is typically ~20% of total CPU time at
-O2/-O3. IIRC, with lower optimization levels, I saw 10-15%.
So that ~20% number is a rough upper bound for the time spent in the LLVM
optimizers and code generation, and hence an upper bound on the time for
The fact that clang does IRGen as it parses (hence it fell under "parsing
time" in my mesurements) makes it somewhat difficult to pinpoint how much
time is spent on duplicates during IRGenj. If you want to measure this, you
could do it similarly to how I describe measuring per-file time in
http://permalink.gmane.org/gmane.comp.compilers.clang.devel/42127 but with
extra probes tracking calls into IRGen. Also adding probes inside of the
middle end and back end to track per-function time.
By combining this information with information from the linker about which
functions end up becoming "duplicates", you should have a decent empirical
estimate for the data that you want. You might do this by placing probes in
the linker so that you can easily measure any project by just building it
with the instrumented toolchain and using DTrace to funnel out all the
data, which can then be fed into a script.
-- Sean Silva
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev