[LLVMdev] RFC: Machine Level IR text-based serialization format

Matthias Braun matze at braunis.de
Tue Apr 28 16:26:35 PDT 2015


To get this out first: I'd love to have a way to serialize machine-IR! I often spend a lot of time trying to create .ll files in a way that the machine-IR still looks a certain way when it finally hits the relevant passes in codegen. It would be so much easier to just specify the machine IR immediately before the pass I'm interested in.

For that use case it is worth keeping the following things in mind:
- Please try to keep the output of the various dump functions, esp. MachineInstr::dump(), MachineOperand::dump(), MachineBasicBlock::dump() as close as possible to the format you use for serializing. It would be unnecessary confusing to have the dump()s while I debug different from what I can read in a textfile. Having said that you don't necessarily have to change your serialization format to be like the dump() functions, you may just as well adjust the dump() functions - just avoid them being different without reason. I can also imagine that the serialization shows a bit less information in cases where the information which is obvious in a serialization context but not when dump()ing a piece in isolation.
- Design the format in a way that makes it easy for humans to create it. If the only way to produce these files reliably is by dumping existing machine-ir I will have a hard time designing minimal and easy to understand testcases. By that I mean mostly the possibility to leave out information that can be inferred or guessed, so the resulting test is compact and shows what it is about. Just looking at your example below there is a lot of information that is redundant or which could be filled in by sensible defaults: the function "number", the basic block number, predecessors and successors of a basic block, maybe allowing to leave out the llvm IR (though that probably is not allowed by CodeGen at the moment).

- Matthias

> On Apr 28, 2015, at 2:08 PM, Bevin Hansson <bevinh at sics.se> wrote:
> 
> On 2015-04-28 20:18, Alex L wrote:
>> 2015-04-28 10:15 GMT-07:00 Hal Finkel <hfinkel at anl.gov>:
>> Hi Alex,
>> I think this looks promising. What are the 1 an 4 above? How are you
>> proposing to serialize operand flags (dead, etc.)?
>> -Hal
>> Hi Hal,
>> The 1 and 4 above are constants that are specific to x86 memory addressing,
>> I believe they basically compute the address RSP + 1 * 0 + 4.
>> I haven't settled on a final version of the operand flags (for registers)
>> syntax, but at the moment I'm thinking of something like this:
>> - The IsDef flag is implied by the use of the register before the '=',
>> unless it's implicit.
>> - TiedTo and IsEarlyClobber aren't not serialized, as they are defined by
>> the instruction description. (I believe that's true in all cases, but I'm
>> not 100% sure).
>> - IsUndef, IsImp, IsKill, IsDead, IsInternalRead, IsDebug - keywords like
>> 'implicit', 'undef', 'kill', 'dead' are used before the register e.g.
>> 'undef %rax', 'implicit-def kill %eflags'.
>> I don't have a syntax for the SubReg_TargetFlags at the moment.
> 
> Since the instruction format is partially based on the machine dump format,
> you could use something similar to that, like '%reg:subreg'.
> 
> On an tangential note, IIRC the machine dumps store the virtual register
> information (register class, mainly) in-band at the end of the instruction.
> Based on the format you described, I'm assuming this is what would be stored
> out-of-band in 'regInfo'.
> 
> / Bevin
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev





More information about the llvm-dev mailing list