[LLVMdev] Proposal: extended MDString syntax

Thu Jun 27 10:49:16 PDT 2013

On Jun 27, 2013, at 10:12 AM, Chandler Carruth <chandlerc at google.com> wrote:

> 
> On Thu, Jun 27, 2013 at 9:50 AM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
> On Jun 26, 2013, at 4:18 PM, Eric Christopher <echristo at gmail.com> wrote:
> 
> > So inverting it so that MI contains LLVM IR instead of the other way
> > around? Then we'd need a serialization format for MI that happened to
> > include a way of serializing LLVM IR within. From a quick "hey, this
> > seems reasonable" the idea of embedding the MI into the IR rather than
> > the other way around seems to make sense since we have already have
> > code to serialize the IR.
> 
> I’d suggest something based on YAML which would allow you to include IR verbatim just by indenting it.
> 
> We can also use YAML embedded inside IR, potentially using the string syntax Dan proposed or any other number of embedding mechanisms.
> 
> I like using YAML to represent the somewhat arbitrary datastructures of MI so that we don't spend a lot of time inventing clever syntax for something that has much more limited uses than the actual IR. I haven't heard anyone really object to it.
> 
> However, I do think it's an open question as to whether to embed IR in a MI container, or MI in an IR container. A few observations:
> 
> - No one has pointed out any really fundamental *problems* with any of the approaches. I think both approaches can be made to work with reasonable amounts of effort, and neither has really fundamental design problems.
> 
> - Different use cases will be more or less easy to write in different forms. For example, Jakob's point:
> The IR module should be optional when serializing MI. The back-pointers from MI to IR are not required, and I can imagine many useful test cases that won’t need them.
> 
> I've heard Dan and others say exactly the opposite -- that MI should be optional. I suspect that some test cases are more MI focused, and some are less. But I don't see either being optional as a hard prerequisite.

Back-pointers from MI to LLVM IR is a hack that gets the job done, but it is not good IR design. We are already seeing the usefulness of memory operands crumble because of the stack coloring pass. Throw in something like modulo scheduling, and they will be completely wrong for alias analysis.

MI should be allowed to evolve into a proper self-contained IR that doesn’t depend on LLVM IR.

I don’t want to canonicalize this hack by encoding it in the file format we use for our tests. A container format that holds LLVM IR and MI as sibling top-level entities is much easier to gradually change towards a standalone MI IR.

Thanks,
/jakob