[cfe-dev] [PATCH] Wrap clang modules inside Mach-O/ELF/COFF containers

Argyrios Kyrtzidis kyrtzidis at apple.com
Fri Jan 9 15:07:27 PST 2015


> On Jan 7, 2015, at 4:45 PM, Adrian Prantl <aprantl at apple.com> wrote:
> 
>> 
>> On Jan 7, 2015, at 11:32 AM, Adrian Prantl <aprantl at apple.com> wrote:
>> 
>> 
>>> On Jan 6, 2015, at 2:02 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com> wrote:
>>> 
>>> 
>>>> On Jan 6, 2015, at 10:07 AM, Adrian Prantl <aprantl at apple.com> wrote:
>>>> 
>>>>> 
>>>>> On Dec 12, 2014, at 8:47 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>>>> 
>>>>> 
>>>>>> On Dec 12, 2014, at 5:37 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Dec 12, 2014, at 4:33 PM, Eric Christopher <echristo at gmail.com> wrote:
>>>>>>> 
>>>>>>> Debug info for types isn't inherently a code generation concept. If you think about it, debug info for types is a stable (if lossy) serialization method for a module file. The line number etc for when there's code generated is a separate issue.
>>>>>> 
>>>>>> I see what you mean, but it is a traditionally codegen product with a particular use-case, and it’s not reasonable to force it on every clang client that only wants to parse code, like libclang, static analyzers, migrators, refactoring tools, etc., or builds that didn’t ask for it.
>>>>> 
>>>>> Good point, I tend to forget about non-compiler users of clang modules.
>>>>> 
>>>>> If we do decide that having clang modules without debug info is desirable, and we want debug info to be generated lazily (only when needed) then putting it into a separate file is preferable, because it then can be captured as a dependency by build systems.
>>>>> 
>>>>> It looks like at this point everyone’s argument is really depending on an assumption that emitting debug info is expensive (or really cheap!, respectively), so my suggestion is to revisit this thread once I actually have some numbers on how long it takes to emit debug info and how much space it takes up. I’ll try to get that done soon.
>>>> 
>>>> Hi Argyrios,
>>>> 
>>>> back from the break, here are the promised numbers to make our decision easier:
>>>> 
>>>> I did an experiment where I patched clang to emit debug type info for each type (patch attached for the curious), and compiled an empty program that imports the Cocoa.h header. To compare the sizes I emitted the DWARF to a separate file:
>>>> 
>>>> -rw-r--r--  1 adrian  staff  2151068 Dec 19 16:30 Foundation-3QM1BFEPXW18W.pcm
>>>> -rw-r--r--  1 adrian  staff   110772 Dec 19 16:30 Foundation-3QM1BFEPXW18W.pcm.o
>>>> 
>>>> here’s AppKit:
>>>> 
>>>> -rw-r--r--  1 adrian  staff  3302744 Dec 19 16:40 AppKit-5HXLHEH4UB4M.pcm
>>>> -rw-r--r--  1 adrian  staff   279080 Dec 19 16:40 AppKit-5HXLHEH4UB4M.pcm.o
>>>> 
>>>> The median of the size of the DWARF compared to the size of the pcm over all the modules pulled in by Cocoa.h is 5%; i.e., the DWARF would take up roughly 5% of the size of the individual modules.
>>>> 
>>>> From these numbers I would argue that DWARF emission is comparatively cheap. To keep the implementation simple, I’d prefer to have everything in one file; this way we won’t have to introduce another layer of locking for creating the pcm.o files lazily, but if someone wants to point out that this is a lame excuse, be my guest ;-)
>>>> [Another reason to argue for separate .pcm.o files is if we ever want to put something target-specific in there, such as code. Currently this is not the case, and even if we did this, we would still benefit from having the DWARF type information shared between the several .pcm.o files]
>>>> 
>>>> tl;dr: either way is fine for me, having a single file is easier to implement.
>>> 
>>> I noticed that you are passing CodeGenOptions for the debug info generator, will some of these end up affecting the module hash or can CodeGenOptions be derived purely from LangOptions or rest of options that are used for the module hash ?
>> 
>> The only CodeGenOptions that are actually needed are the ones controlling the debug info output, which I need to override anyway. I think it’s a good idea to just create them from scratch.
>> 
>>> What are the timing results ?
>> 
>> After adding a couple of timers to a ReleaseAsserts build with the above example:
>> 
>> rm -rf cache && time $R/clang -fmodules test.m -c -fmodules-cache-path=./cache
>> ===-------------------------------------------------------------------------===
>>                        Miscellaneous Ungrouped Timers
>> ===-------------------------------------------------------------------------===
>> 
>>  ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>>  6.8447 ( 80.2%)   0.9394 ( 84.3%)   7.7841 ( 80.6%)   7.9102 ( 80.9%)  compileModuleImpl()
>>  1.1064 ( 13.0%)   0.1018 (  9.1%)   1.2082 ( 12.5%)   1.2082 ( 12.4%)  PCHGenerator::HandleTranslationUnit()
>>  0.5886 (  6.9%)   0.0730 (  6.6%)   0.6617 (  6.9%)   0.6616 (  6.8%)  DWARF module output
>>  8.5397 (100.0%)   1.1142 (100.0%)   9.6540 (100.0%)   9.7800 (100.0%)  Total
>> 
>> Visiting the AST, compiling the types into DWARF and flushing the .pcm.o takes about 7% of the total time.
> 
> Here's another data point. Greg was curious if we could include the function signatures (subprograms, subroutine_types, formal_parameters), and if we do, the time for the DWARF module output goes up to 8% of the total, and the .pcm.o file size roughly doubles.

Thanks for looking into this! The numbers seem reasonable to me, I’m fine with the single file.

> 
> -- adrian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150109/0c6d9465/attachment.html>


More information about the cfe-dev mailing list