[LLVMdev] RFC: Machine Level IR text-based serialization format

Alex L arphaman at gmail.com
Tue Apr 28 11:00:44 PDT 2015


2015-04-28 10:14 GMT-07:00 Quentin Colombet <qcolombet at apple.com>:

> Hi Alex,
>
> Thanks for working on this.
>
> Personally I would rather not have to write YAML inputs but instead resort
> on the what the machine dumps look like. That being said, I can live with
> YAML :).
>
> More importantly, how do you plan to report syntax errors to the users?
> Things like invalid instruction, invalid registers, etc.?
> What about unallocated code, i.e., virtual registers, invalid SSA form,
> etc.?
>
> Cheers,
> Q.
>

Thanks,

Unfortunately, the machine dumps are quite incomplete (and tricky to parse
too!), and thus some sort of new syntax has to be developed.
I think that a YAML based container is a good candidate for this purpose,
as it has a structured format that represents things like machine functions,
frame information, register information, target specific machine function
details, etc in a clear and readable way.

I haven't thought about error reporting that much, as I've been mostly
working on developing the syntax and making sure that all the data
structures
can be represented by it. But I believe that the errors that crop up in an
invalid machine instruction syntax, like invalid basic block references,
invalid instructions,
etc. can be reported quite well and I can rely on already existing error
reporting facilities in LLVM to help me. The more structural errors, like
missing attributes
will be handled by the YAML parser automatically, and I might extend it to
provide better/more specific error messages. And I think that it's possible
to use the machine verifier to catch the other errors that you've mentioned.

Alex



> On Apr 28, 2015, at 9:56 AM, Alex L <arphaman at gmail.com> wrote:
>
> Hi all,
>
>
> I would like to propose a text-based, human readable format that will be used to
>
> serialize the machine level IR. The major goal of this format is to allow LLVM
>
> to save the machine level IR after any code generation pass and then to load
>
> it again and continue running passes on the machine level IR. The primary use case
>
> of this format is to enable easier testing process for the code generation passes,
>
> by allowing the developers to write tests that load the IR, then invoke just a
>
> specific code gen pass and then inspect the output of that pass by checking the
>
> printed out IR.
>
>
>
> The proposed format has a number of key features:
>
> - It stores the machine level IR and the optional LLVM IR in one text file.
>
> - The connections between the machine level IR and the LLVM IR are preserved.
>
> - The format uses a YAML based container for most of the data structures. The LLVM
>
>   IR is embedded in the YAML container.
>
> - The format also uses a new, text-based syntax to serialize the machine instructions.
>
>   The instructions are embedded in YAML.
>
>
> This is an incomplete example of a YAML file containing the LLVM IR, the machine level IR
>
> and the instructions:
>
>
> ---
>
> ir: |
>
>   define i32 @fact(i32 %n) {
>
>     %1 = alloca i32, align 4
>
>     store i32 %n, i32* %1, align 4
>
>     %2 = load i32, i32* %1, align 4
>
>     %3 = icmp eq i32 %2, 0
>
>     br i1 %3, label %10, label %4
>
>
>   ; <label>:4                                       ; preds = %0
>
>     %5 = load i32, i32* %1, align 4
>
>     %6 = sub nsw i32 %5, 1
>
>     %7 = call i32 @fact(i32 %6)
>
>     %8 = load i32, i32* %1, align 4
>
>     %9 = mul nsw i32 %7, %8
>
>     br label %10
>
>
>   ; <label>:10                                      ; preds = %0, %4
>
>     %11 = phi i32 [ %9, %4 ], [ 1, %0 ]
>
>     ret i32 %11
>
>   }
>
>
> ...
>
> ---
>
> number:          0
>
> name:            fact
>
> alignment:       4
>
> regInfo:
>
>   ....
>
> frameInfo:
>
>   ....
>
> body:
>
>   - bb:              0
>
>     llbb:            '%0'
>
>     successors:      [ 'bb#2', 'bb#1' ]
>
>     liveIns:         [ '%edi' ]
>
>     instructions:
>
>       - 'push64r undef %rax, %rsp, %rsp'
>
>       - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'
>
>       - ....
>
>         ....
>
>   - bb:              1
>
>     llbb:            '%4'
>
>     successors:      [ 'bb#2' ]
>
>     instructions:
>
>       - '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg'
>
>       - ....
>
>         ....
>
>   - ....
>
>     ....
>
> ...
>
>
> The example above shows a YAML file with two YAML documents (delimited by `---`
>
> and `...`) containing the LLVM IR and the machine function information for the function `fact`.
>
>
>
> When a specific format is chosen, I'll start with patches that serialize the
>
> embedded LLVM IR. Then I'll add support for things like machine functions and
>
> machine basic blocks, and I think that an intrusive implementation will work best
>
> for data structures like these. After that I will continue adding support for
>
> serialization of the remaining data structures.
>
>
>
> Thanks for reading through the proposal. What are you thoughts about this format?
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150428/cfb2abda/attachment.html>


More information about the llvm-dev mailing list