[cfe-dev] JumboSupport: making unity builds easier in Clang

Daniel Bratell via cfe-dev cfe-dev at lists.llvm.org
Wed Apr 11 11:56:44 PDT 2018


It is different rates for maintaining and for initially adding support.

When first preparing the code for jumbo there are several groups of  
changes necessary. Some of them are just that the code initially did  
something wrong that is suddenly detected in jumbo builds, some of them is  
that the same constant/function name is used in many files. kBufferSize,  
kIconSize, kSecondsPerMinute, GetThingWithNullCheck(), those kind of  
things.

In the initial cleanup I think names, the kind of problems clang support  
as suggested would help with, is about 60-80%, and the experiment with  
/net in Chromium supports that estimate.

After the initial cleanup, the new problems that appear seems to be of the  
"duplicate symbol name" kind to a much higher degree, maybe 90%.

So if those rough estimates are correct, it would make it 4 times as easy  
to implement something like jumbo, and 10 times as easy to maintain, and  
it would mean that developers can use the short common names they have  
become accustomed to.

It would also hide some code problems that jumbo right now expose, such as  
copy/pasted code but if we can live with it today, we can probably survive  
with that a while longer and leave it to other tools to find such problems.

/Daniel

(My notes from adding jumbo to a code part with 1000+ files, those with a  
* would probably have been unnecessary if clang had had this support:
----
* 20.5 patches to rename something
* 11.5 patches to remove duplicate code
2 fixes to bad forward declarations
1 removal of "using namespace" (not allowed by the coding standard)
1 fix to ambiguity between ::prefs and ::metric::prefs
1 fix to clash with X11 headers
3 fixes to clashes with Windows headers
* 3 changes to inline trivial code/constants
1 case of bind.h finding Bind being called the wrong way thanks to access  
to more type information
1 removal of dead code
1 patch to add include guards
)

On Wed, 11 Apr 2018 19:53:58 +0200, Kim Gräsman <kim.grasman at gmail.com>  
wrote:

> See also: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Tenseconds.pdf
>
> I started experimenting with a unity build of an LLVM/Clang-sized
> proprietary project at my previous employer, and I found the basics
> easy to get going. The hard part was massaging the code base to avoid
> collisions, as indicated by the work by Mostyn & co.
>
> I left the job before I had a chance to fully evaluate it, but
> assuming I'd had something like `#pragma jumbo` to reduce the
> friction, it might have been easier to get more data for less effort.
>
> Mostyn/Daniel, do you have any gut feel/data on how much of the
> problem a #pragma would solve? I suppose there are still constructs
> that `#pragma jumbo` can't help with, that requires manual
> intervention?
>
> Also, Chromium is hardly a typical codebase, the little I've looked at
> it, it's *extremely* clean and consistent, so it might be interesting
> to try this on something else. Maybe LLVM itself would be an
> interesting candidate.
>
> - Kim
>
> On Wed, Apr 11, 2018 at 7:08 PM, via cfe-dev <cfe-dev at lists.llvm.org>  
> wrote:
>> If you want to share ASTs (an ephemeral structure) clang would need to  
>> do
>> the distributing.  If you want to share IR of instantiated templates,  
>> you
>> can do a shared database where clang is much less involved in managing  
>> the
>> distribution.  Say the database key can be maybe a hash of the token  
>> stream
>> of the template definition would work?  plus the template parameters.   
>> Then
>> you can pull precompiled IR out of the database (if you want to do
>> optimizations) or make a reference to it (if you're doing LTO).
>>
>> --paulr
>>
>>
>>
>> From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] On Behalf Of David
>> Blaikie via cfe-dev
>> Sent: Wednesday, April 11, 2018 11:09 AM
>> To: David Chisnall
>> Cc: Bruce Dawson; Daniel Cheng; richard at metafoo.co.uk;
>> cfe-dev at lists.llvm.org; Daniel Bratell; Jens Widell
>> Subject: Re: [cfe-dev] JumboSupport: making unity builds easier in Clang
>>
>>
>>
>> This would have issues with distributed builds, though, right? Unless  
>> clang
>> then took on the burden of doing the distribution too, which might be a  
>> bit
>> much.
>>
>> On Wed, Apr 11, 2018 at 12:43 AM David Chisnall via cfe-dev
>> <cfe-dev at lists.llvm.org> wrote:
>>
>> On 10 Apr 2018, at 21:28, Daniel Bratell via cfe-dev
>> <cfe-dev at lists.llvm.org> wrote:
>>>
>>> I've heard (hearsay, I admit) from profiling that it seems the single
>>> largest time consumer in clang is template instantiation, something I  
>>> assume
>>> can't easily be prepared in advance.
>>>
>>> One example is chromium's chrome/browser/browser target which is 732  
>>> files
>>> that normally need 6220 CPU seconds to compile, average 8,5 seconds per
>>> file. All combined together gives a single translation unit that takes  
>>> 400
>>> seconds to compile, a mere 0.54 seconds on average per file. That  
>>> indicates
>>> that about 8 seconds per compiled file is related to the processing of
>>> headers.
>>
>> It sounds as if there are two things here:
>>
>> 1. The time taken to parse the headers
>> 2. The time taken to repeatedly instantiate templates that the linker  
>> will
>> then discard
>>
>> Assuming a command line where all of the relevant source files are  
>> provided
>> to the compiler invocation:
>>
>> Solving the first one is relatively easy if the files have a common  
>> prefix
>> (which can be determined by simple string comparison).  Find the common
>> prefix in the source files, build the clang AST, and then do a clone for
>> each compilation unit.  Hopefully, the clone is a lot cheaper than
>> re-parsing (and can ideally share source locations).
>>
>> The second is slightly more difficult, because it relies on sharing  
>> parts of
>> the AST across notional compilation units.
>>
>> To make this work well with incremental builds, ideally you’d spit out  
>> all
>> of the common template instantiations into a separate IR file, which  
>> could
>> then be used with ThinLTO.
>>
>> Personally, I would prefer to have an interface where a build system can
>> invoke clang with all of the files that need building and the degree of
>> parallelism to use and let it share as much state as it wants across  
>> builds.
>> In an ideal world, clang would record which templates have been  
>> instantiated
>> in a prior build (or a previous build step in the current build) and  
>> avoid
>> any IRGen for them, at the very least.
>>
>> Old C++ compilers, predating linker support for COMDATs, emitted  
>> templates
>> lazily, simply emitting references to them, then parsing the linker  
>> errors
>> and generating missing implementations until the linker errors went  
>> away.
>> Modern C++ compilers generate many instantiations of the same templates  
>> and
>> then discard most of them.  It would be nice to find an intermediate  
>> point,
>> which worked well with ThinLTO, where templates could be emitted once  
>> and be
>> available for inlining everywhere.
>>
>> David
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>


-- 
/* Opera Software, Linköping, Sweden: CEST (UTC+2) */


More information about the cfe-dev mailing list