[LLVMdev] How to reduce the footprint of MDNodes? (About the comment you made at BOF LTO)

Tue Nov 12 18:00:07 PST 2013

On Nov 12, 2013, at 4:19 PM, Chandler Carruth <chandlerc at google.com> wrote:
> On Tue, Nov 12, 2013 at 4:14 PM, Chris Lattner <clattner at apple.com> wrote:
>> I'm moderately opposed to just encoding these in a string format. I think we can do something substantially better both for space, time, and readability. Fundamentally, there is no reason for the original metadata node you describe to not *encode* its operands into a dense bit-packed blob of memory. We can still expose APIs that manipulate them as separate entities, and have the AsmPrinter and AsmParser map back and forth with nice human-readable forms. But even a simple varint encoding will be both smaller and faster than ascii.
> 
> I guess you could make it work, but would that actually be simpler than what is proposed?  If it is denser, how much denser would it have to be to justify the complexity?
> 
> I don't think it would be more complex than a string encoding. At least, I'm not imagining we want to be super clever here.
> 
> I could even imagine doing a versioned giant bitfield and using the version to handle auto-upgrade…

You must mean something other than I’m imagining :-).  From your description, I think you’re describing that we have some new kind of “compressed mdnode” that bitpacks data in some way, and that we’d have the existing .ll and .bc syntax be magically compacted without a syntax change.  Is that what you’re describing?

If so, that sounds really complicated: all of the readers would have to recognize the new format, and we lose alignment of in memory IR with the printed form (making debugging the compiler simpler).

If you’re talking about exposing this as syntax in .ll files (and encoding details in .bc files) then I’m not sure how this would work.  You’d have to have the schema for each node described somewhere.  Where would this exist.

More generally, can you explain more of what you’re thinking here?

>  
>> Just to be clear, I still want the nice format (much like your proposed format, but maybe with the numbers outside of the "s) in the textual IR, I just think we should use a more direct and efficient in-memory encoding (and in-bitcode encoding if that isn't already suitably dense).
> 
> Where would the encoding schema be specified?
> 
> Same question applies to a string encoding. We have to define the schema somewhere clearly. I'm just lobbying for the textual IR and the APIs to both operate directly on N fields, and just make the memory representation dense.

The advantage of strings is that it moves the schema complexity to the debug info - related machinery like DIBuilder.  The core IR features like MDNode, the asmparser/writer, bitcode support, etc don’t need anything specific to debug information.

-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131112/ee16618c/attachment.html>