[cfe-dev] [PATCH] Wrap clang modules inside Mach-O/ELF/COFF containers

Adrian Prantl aprantl at apple.com
Thu Feb 19 14:20:30 PST 2015


> On Feb 19, 2015, at 1:09 PM, Sean Silva <chisophugis at gmail.com> wrote:
> 
> 
> 
> On Wed, Feb 18, 2015 at 7:08 PM, Richard Smith <richard at metafoo.co.uk> wrote:
> Sorry for the delay. Generally, I'm happy with the direction here; if you want to go ahead with this as-is and deal with the below as incremental changes on top of the current patch, I'm fine with that.
> 
> 
> +  std::unique_ptr<SmallVectorImpl<char>> SerializedASTBuffer;
> 
> This seems like a strange data type to be dealing in. There's no point in putting a SmallVector into a unique_ptr, since that implies it will always be heap-allocated and thus never "small".
> 
> IIRC SmallVector<PodT> is slightly nicer with large buffers since it can use realloc vm tricks to resize for POD, whereas the semantics of std::vector don't permit that (see the post-commit on r175906). Not sure if that was actually the motivation for SmallVector here, but perhaps.

The real motivstion was that llvm::BitstreamWriter expects a SmallVectorImpl<char>. I think I should just be honest and actually make this a SmallVector<char, 0> instead, because in reality the module buffer will never be small enough to reasonably fit onto the stack.

-- adrian

> 
> -- Sean Silva
>  
> 
> I also feel somewhat uncomfortable about giving the ModuleContainerGenerator an owning pointer to this SmallVectorImpl while another class also has a reference to it and will be updating it. You're implicitly relying on PCHGenerator::HandleTranslationUnit to finalize the AST file contents before ModuleContainerGenerator::HandleTranslationUnit writes out the data.
> 
> I think I'd prefer if the PCHGenerator owned its vector of data, and was given some kind of callback that it used to provide the AST contents when it was finished; that could then be hooked up to the ModuleContainerGenerator, which could at least assert if it's given the data too late (after its HandleTranslationUnit is called).
> 
> 
> +  CGOpts.OptimizationLevel = 0;
> 
> A comment explaining what this is for would be useful.
> 
> On Mon, Feb 16, 2015 at 4:35 PM, Adrian Prantl <aprantl at apple.com> wrote:
> 
>> On Jan 30, 2015, at 6:04 PM, Adrian Prantl <aprantl at apple.com> wrote:
>> 
>> 
>>> On Jan 27, 2015, at 5:12 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>> 
>>> On Tue, Jan 27, 2015 at 9:29 AM, Adrian Prantl <aprantl at apple.com> wrote:
>>> 
>>> > On Jan 26, 2015, at 5:34 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>> >
>>> > On Mon, Jan 26, 2015 at 4:05 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>> >> Here is the final version of the patch that should address all the points
>>> >> raised so far in this thread. Any final comments/objections/suggestions
>>> >> before this can go into trunk?
>> 
>> Hi all,
>> 
>> Here’s the upgraded version of the patch, which addresses all of your suggestions. The second patch enables debug info output, so you can see the first one in context and know where this is heading.
>> 
>>> >
>>> > Using a fixed string literal as a key for the PCH data seems a bit
>>> > weird to me. I'm also not sure a module flag is the best way to model
>>> > this: can we instead emit this as a normal global, and keep the
>>> > knowledge of its special section name within Clang?
>>> That sounds like a good idea — I wasn’t aware that globals could be annotated with a section name.
>>> I’ll give it a shot.
>> 
>> This works really great for ELF and Mach-O; using a global symbol with a custom section allowed me to get rid of the LLVM patch entirely! The big drawback is that I can no longer influence the order in which the sections are emitted, which turns out to be a problem for COFF.
>> Although COFF sections are marked with the correct alignment characteristics, they are not actually emitted aligned on disk. This is hopefully jus a bug in our COFF backend. Since the DWARF sections are of arbitrary size and come before the serialized AST, this messes up the alignment for OnDiskHashTable. Until I figure out a solution for this problem, I’ve disabled module debug info ouput for COFF.
>> 
>>> >
>>> > I would imagine that you'll want to generate debug information via
>>> > Clang's normal ASTConsumer -> CodeGen pipeline, which suggests that
>>> > building a custom LLVM code generator and running it from within
>>> > PCHGenerator::HandleTranslationUnit is the wrong choice. This should
>>> > probably be handled by a FrontendAction that passes the bitcode
>>> > generated by the PCHGenerator to CodeGen. That'd also avoid you
>>> > needing to plumb a CompilerInstance into Serialization, which seems
>>> > like a layering violation (Frontend depends on Serialization, not vice
>>> > versa).
>>> 
>>> Thanks for the pointer. In this case both the code generator and the output buffer for the serialized AST is owned by a FrontendAction which then wraps it in a global and passes it on to CodeGen.
>>> In order to wire up the debug information, since I can only have one ASTConsumer per FrontendAction, I will probably need to have a second FrontendAction for the debug info generator?
>>> 
>>> If you need it, it shouldn't be hard to write an ASTConsumer implementation that forwards calls to multiple other consumers. You'll probably need to ensure that PCHGenerator is first in the sequence to ensure that it can emit the PCH data before we finalize the IR.
>> 
>> I found ASTMultiplexConsumer, which does exactly what I needed (after I added all the missing forwarding methods).
>> 
>>> 
>>> I’ll check out how clang works and how to communicate between FrontendActions.
>>> I might be back with more questions :-)
>>> 
>>> >
>>> > For the section name (repeating a comment from before): I don't like
>>> > "cfe"; we should indicate which compiler is involved here. I don't
>>> > like "pch"; this is used for all kinds of AST files, not just PCH
>>> > files. Can you pick a better name for the section?
>>> 
>>> That particular name was a suggestion by David Majnemer  who pointed out that COFF section names are restricted to 8 characters. I chose PCH because the magic number in .pcm files is CPCH.
>>> The full list of constraints is:
>>> - 8 characters (COFF)
>>> - starts with __ (MachO)
>>> 
>>> How about clangast on COFF and ELF, and __clangast on MachO? Or do we need to keep the name the same in all cases?
>> 
>> Since the name is only used in two locations, I think having a platform-specific one isn’t too bad. I’ve implemented your suggestion.
>> 
>> thanks for all the feedback so far!
>> -- adrian
>> <0001-CFE-Wrap-clang-module-files-in-a-Mach-O-ELF-COFF-con.patch>
>> <0002-Clang-emits-Debug-Info-for-Modules.patch>
>> 
>>>  
>>> - should indicate clang (COFF. On ELF and MachO the section is inside a __CLANG segment)
>>> - should indicate that this is an ast
>>> 
>>> If we want to use the same section name on all platforms, this leaves any combination of
>>> __{cfe|clg|cln}{ast|pch|pcm|mod}
>>> or maybe
>>> __ast
>>> __cpch
>>> __llvmod
>>> 
>>> other suggestions?
>>> -- adrian
>>> 
> 
> Pinging Richard.
> Thanks to David Majnemers alignment fix for the COFF object writer in r228879, this now works with all supported object containers.
> 
> -- adrian
> 
> 
> 
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> 
> 





More information about the cfe-dev mailing list