[LLVMdev] RFC: Machine Level IR text-based serialization format

Thu Apr 30 12:54:36 PDT 2015

> 
> On Apr 28, 2015, at 9:56 AM, Alex L <arphaman at gmail.com> wrote:
> 
> Hi all,
> 
> I would like to propose a text-based, human readable format that will be used to
> serialize the machine level IR. The major goal of this format is to allow LLVM
> to save the machine level IR after any code generation pass and then to load
> it again and continue running passes on the machine level IR. The primary use case
> of this format is to enable easier testing process for the code generation passes,
> by allowing the developers to write tests that load the IR, then invoke just a
> specific code gen pass and then inspect the output of that pass by checking the
> printed out IR.
> 
> 
> The proposed format has a number of key features:
> - It stores the machine level IR and the optional LLVM IR in one text file.
> - The connections between the machine level IR and the LLVM IR are preserved.
> - The format uses a YAML based container for most of the data structures. The LLVM
>   IR is embedded in the YAML container.
> - The format also uses a new, text-based syntax to serialize the machine instructions.
>   The instructions are embedded in YAML.
> 
> This is an incomplete example of a YAML file containing the LLVM IR, the machine level IR
> and the instructions:
> 
> ---
> ir: |
>   define i32 @fact(i32 %n) {
>     %1 = alloca i32, align 4
>     store i32 %n, i32* %1, align 4
>     %2 = load i32, i32* %1, align 4
>     %3 = icmp eq i32 %2, 0
>     br i1 %3, label %10, label %4
> 
>   ; <label>:4                                       ; preds = %0
>     %5 = load i32, i32* %1, align 4
>     %6 = sub nsw i32 %5, 1
>     %7 = call i32 @fact(i32 %6)
>     %8 = load i32, i32* %1, align 4
>     %9 = mul nsw i32 %7, %8
>     br label %10
> 
>   ; <label>:10                                      ; preds = %0, %4
>     %11 = phi i32 [ %9, %4 ], [ 1, %0 ]
>     ret i32 %11
>   }
> 
> ...
> ---
> number:          0
> name:            fact
> alignment:       4
> regInfo:
>   ....
> frameInfo:
>   ....
> body:
>   - bb:              0
>     llbb:            '%0'
>     successors:      [ 'bb#2', 'bb#1' ]
>     liveIns:         [ '%edi' ]
>     instructions:
>       - 'push64r undef %rax, %rsp, %rsp'
>       - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'
>       - ....
>         ....
>   - bb:              1
>     llbb:            '%4'
>     successors:      [ 'bb#2' ]
>     instructions:
>       - '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg'
>       - ....
>         ....
>   - ....
>     ....
> ...
> 
> The example above shows a YAML file with two YAML documents (delimited by `---`
> and `...`) containing the LLVM IR and the machine function information for the function `fact`.
> 
> 
> When a specific format is chosen, I'll start with patches that serialize the
> embedded LLVM IR. Then I'll add support for things like machine functions and
> machine basic blocks, and I think that an intrusive implementation will work best
> for data structures like these. After that I will continue adding support for
> serialization of the remaining data structures.
> 
> 
> Thanks for reading through the proposal. What are you thoughts about this format?

I’m really looking forward to this; it will be extremely useful for testing the debug info backend.
For debug nodes referenced via DBG_VALUE intrinsics, it looks like they could just point to the corresponding nodes in the optional IR.
Are there any plans to represent metadata such as the DebugLoc(ations) attached to the machine instructions?

-- adrian