<p dir="ltr">There's no reason to rewrite the IR parser. </p>

<p dir="ltr">-eric</p>

<br><div class="gmail_quote">On Tue, Apr 28, 2015, 10:39 PM Hayden Livingston <<a href="mailto:halivingston@gmail.com">halivingston@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">As an aside, you haven't mentioned but will the IR parser be rewritten<br>

at all? Is the YAML a container on top of the IR?<br>

<br>

If you are rewriting the IR parser, would it be possible to maintain<br>

some sort of grammar?<br>

<br>

On Tue, Apr 28, 2015 at 5:59 PM, David Majnemer<br>

<<a href="mailto:david.majnemer@gmail.com" target="_blank">david.majnemer@gmail.com</a>> wrote:<br>

><br>

><br>

> On Tuesday, April 28, 2015, Sean Silva <<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>> wrote:<br>

>><br>

>><br>

>><br>

>> On Tue, Apr 28, 2015 at 3:51 PM, David Majnemer <<a href="mailto:david.majnemer@gmail.com" target="_blank">david.majnemer@gmail.com</a>><br>

>> wrote:<br>

>>><br>

>>> I love the idea of having some sort of textual representation.  My only<br>

>>> concern is that our YAML parser is not very actively maintained (is there<br>

>>> someone expert with its implementation *and* active in the project?) and<br>

>>> (IMHO) over-engineered when compared to the simplicity of our custom IR<br>

>>> parser.<br>

>>><br>

>>> Without TLC, I'm afraid it would make for a poor piece of LLVM<br>

>>> infrastructure to rely on.  The reliability of the serialization mechanism<br>

>>> is very important if we are to have any chance of applying fuzz testing to<br>

>>> the backend pieces; after all, testability is a huge motivation for this<br>

>>> work.<br>

>>><br>

>>> As a concrete example, a file solely containing '%' crashes the yaml<br>

>>> parser:<br>

>>> $ ~/llvm/Debug+Asserts/bin/yaml2obj -format=coff t.yaml<br>

>>> yaml2obj: ~/llvm/src/lib/Support/YAMLTraits.cpp:78: bool<br>

>>> llvm::yaml::Input::setCurrentDocument(): Assertion `Strm->failed() && "Root<br>

>>> is NULL iff parsing failed"' failed.<br>

>>> 0  yaml2obj        0x000000000048682e<br>

>>> 1  yaml2obj        0x0000000000486b43<br>

>>> 2  yaml2obj        0x000000000048570e<br>

>>> 3  libpthread.so.0 0x00007f5e79643340<br>

>>> 4  libc.so.6       0x00007f5e78c9acc9 gsignal + 57<br>

>>> 5  libc.so.6       0x00007f5e78c9e0d8 abort + 328<br>

>>> 6  libc.so.6       0x00007f5e78c93b86<br>

>>> 7  libc.so.6       0x00007f5e78c93c32<br>

>>> 8  yaml2obj        0x000000000045f378<br>

>>> 9  yaml2obj        0x000000000040d4b3<br>

>>> 10 yaml2obj        0x000000000040b0fa<br>

>>> 11 yaml2obj        0x0000000000404a79<br>

>>> 12 yaml2obj        0x0000000000404dd8<br>

>>> 13 libc.so.6       0x00007f5e78c85ec5 __libc_start_main + 245<br>

>>> 14 yaml2obj        0x0000000000404879<br>

>>> Stack dump:<br>

>>> 0.      Program arguments: ~/llvm/Debug+Asserts/bin/yaml2obj -format=coff<br>

>>> t.yaml<br>

>>><br>

>><br>

>><br>

>> Hopefully a fuzzer that is fuzzing a yaml input would not waste its time<br>

>> with syntactically invalid or unusual YAML.<br>

><br>

><br>

> Maybe.  I don't see why we would want to lock ourselves out of using<br>

> afl-fuzz though.<br>

><br>

>><br>

>><br>

>> Also, you're thinking of YAMLIO which is a layer on top of the YAML parser<br>

>> (YAMLParser.{h,cpp}). It might make sense to not use YAMLIO (it is good for<br>

>> some types of data, not for all) but still use the YAML parser.<br>

>><br>

>> -- Sean Silva<br>

>><br>

>>><br>

>>> On Tue, Apr 28, 2015 at 2:00 PM, Alex L <<a href="mailto:arphaman@gmail.com" target="_blank">arphaman@gmail.com</a>> wrote:<br>

>>>><br>

>>>><br>

>>>><br>

>>>> 2015-04-28 10:14 GMT-07:00 Quentin Colombet <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>>:<br>

>>>>><br>

>>>>> Hi Alex,<br>

>>>>><br>

>>>>> Thanks for working on this.<br>

>>>>><br>

>>>>> Personally I would rather not have to write YAML inputs but instead<br>

>>>>> resort on the what the machine dumps look like. That being said, I can live<br>

>>>>> with YAML :).<br>

>>>>><br>

>>>>> More importantly, how do you plan to report syntax errors to the users?<br>

>>>>> Things like invalid instruction, invalid registers, etc.?<br>

>>>>> What about unallocated code, i.e., virtual registers, invalid SSA form,<br>

>>>>> etc.?<br>

>>>>><br>

>>>>> Cheers,<br>

>>>>> Q.<br>

>>>><br>

>>>><br>

>>>> Thanks,<br>

>>>><br>

>>>> Unfortunately, the machine dumps are quite incomplete (and tricky to<br>

>>>> parse too!), and thus some sort of new syntax has to be developed.<br>

>>>> I think that a YAML based container is a good candidate for this<br>

>>>> purpose, as it has a structured format that represents things like machine<br>

>>>> functions,<br>

>>>> frame information, register information, target specific machine<br>

>>>> function details, etc in a clear and readable way.<br>

>>>><br>

>>>> I haven't thought about error reporting that much, as I've been mostly<br>

>>>> working on developing the syntax and making sure that all the data<br>

>>>> structures<br>

>>>> can be represented by it. But I believe that the errors that crop up in<br>

>>>> an invalid machine instruction syntax, like invalid basic block references,<br>

>>>> invalid instructions,<br>

>>>> etc. can be reported quite well and I can rely on already existing error<br>

>>>> reporting facilities in LLVM to help me. The more structural errors, like<br>

>>>> missing attributes<br>

>>>> will be handled by the YAML parser automatically, and I might extend it<br>

>>>> to provide better/more specific error messages. And I think that it's<br>

>>>> possible<br>

>>>> to use the machine verifier to catch the other errors that you've<br>

>>>> mentioned.<br>

>>>><br>

>>>> Alex<br>

>>>><br>

>>>><br>

>>>>><br>

>>>>> On Apr 28, 2015, at 9:56 AM, Alex L <<a href="mailto:arphaman@gmail.com" target="_blank">arphaman@gmail.com</a>> wrote:<br>

>>>>><br>

>>>>> Hi all,<br>

>>>>><br>

>>>>><br>

>>>>> I would like to propose a text-based, human readable format that will<br>

>>>>> be used to<br>

>>>>><br>

>>>>> serialize the machine level IR. The major goal of this format is to<br>

>>>>> allow LLVM<br>

>>>>><br>

>>>>> to save the machine level IR after any code generation pass and then to<br>

>>>>> load<br>

>>>>><br>

>>>>> it again and continue running passes on the machine level IR. The<br>

>>>>> primary use case<br>

>>>>><br>

>>>>> of this format is to enable easier testing process for the code<br>

>>>>> generation passes,<br>

>>>>><br>

>>>>> by allowing the developers to write tests that load the IR, then invoke<br>

>>>>> just a<br>

>>>>><br>

>>>>> specific code gen pass and then inspect the output of that pass by<br>

>>>>> checking the<br>

>>>>><br>

>>>>> printed out IR.<br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>> The proposed format has a number of key features:<br>

>>>>><br>

>>>>> - It stores the machine level IR and the optional LLVM IR in one text<br>

>>>>> file.<br>

>>>>><br>

>>>>> - The connections between the machine level IR and the LLVM IR are<br>

>>>>> preserved.<br>

>>>>><br>

>>>>> - The format uses a YAML based container for most of the data<br>

>>>>> structures. The LLVM<br>

>>>>><br>

>>>>>   IR is embedded in the YAML container.<br>

>>>>><br>

>>>>> - The format also uses a new, text-based syntax to serialize the<br>

>>>>> machine instructions.<br>

>>>>><br>

>>>>>   The instructions are embedded in YAML.<br>

>>>>><br>

>>>>><br>

>>>>> This is an incomplete example of a YAML file containing the LLVM IR,<br>

>>>>> the machine level IR<br>

>>>>><br>

>>>>> and the instructions:<br>

>>>>><br>

>>>>><br>

>>>>> ---<br>

>>>>><br>

>>>>> ir: |<br>

>>>>><br>

>>>>>   define i32 @fact(i32 %n) {<br>

>>>>><br>

>>>>>     %1 = alloca i32, align 4<br>

>>>>><br>

>>>>>     store i32 %n, i32* %1, align 4<br>

>>>>><br>

>>>>>     %2 = load i32, i32* %1, align 4<br>

>>>>><br>

>>>>>     %3 = icmp eq i32 %2, 0<br>

>>>>><br>

>>>>>     br i1 %3, label %10, label %4<br>

>>>>><br>

>>>>><br>

>>>>>   ; <label>:4                                       ; preds = %0<br>

>>>>><br>

>>>>>     %5 = load i32, i32* %1, align 4<br>

>>>>><br>

>>>>>     %6 = sub nsw i32 %5, 1<br>

>>>>><br>

>>>>>     %7 = call i32 @fact(i32 %6)<br>

>>>>><br>

>>>>>     %8 = load i32, i32* %1, align 4<br>

>>>>><br>

>>>>>     %9 = mul nsw i32 %7, %8<br>

>>>>><br>

>>>>>     br label %10<br>

>>>>><br>

>>>>><br>

>>>>>   ; <label>:10                                      ; preds = %0, %4<br>

>>>>><br>

>>>>>     %11 = phi i32 [ %9, %4 ], [ 1, %0 ]<br>

>>>>><br>

>>>>>     ret i32 %11<br>

>>>>><br>

>>>>>   }<br>

>>>>><br>

>>>>><br>

>>>>> ...<br>

>>>>><br>

>>>>> ---<br>

>>>>><br>

>>>>> number:          0<br>

>>>>><br>

>>>>> name:            fact<br>

>>>>><br>

>>>>> alignment:       4<br>

>>>>><br>

>>>>> regInfo:<br>

>>>>><br>

>>>>>   ....<br>

>>>>><br>

>>>>> frameInfo:<br>

>>>>><br>

>>>>>   ....<br>

>>>>><br>

>>>>> body:<br>

>>>>><br>

>>>>>   - bb:              0<br>

>>>>><br>

>>>>>     llbb:            '%0'<br>

>>>>><br>

>>>>>     successors:      [ 'bb#2', 'bb#1' ]<br>

>>>>><br>

>>>>>     liveIns:         [ '%edi' ]<br>

>>>>><br>

>>>>>     instructions:<br>

>>>>><br>

>>>>>       - 'push64r undef %rax, %rsp, %rsp'<br>

>>>>><br>

>>>>>       - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'<br>

>>>>><br>

>>>>>       - ....<br>

>>>>><br>

>>>>>         ....<br>

>>>>><br>

>>>>>   - bb:              1<br>

>>>>><br>

>>>>>     llbb:            '%4'<br>

>>>>><br>

>>>>>     successors:      [ 'bb#2' ]<br>

>>>>><br>

>>>>>     instructions:<br>

>>>>><br>

>>>>>       - '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg'<br>

>>>>><br>

>>>>>       - ....<br>

>>>>><br>

>>>>>         ....<br>

>>>>><br>

>>>>>   - ....<br>

>>>>><br>

>>>>>     ....<br>

>>>>><br>

>>>>> ...<br>

>>>>><br>

>>>>><br>

>>>>> The example above shows a YAML file with two YAML documents (delimited<br>

>>>>> by `---`<br>

>>>>><br>

>>>>> and `...`) containing the LLVM IR and the machine function information<br>

>>>>> for the function `fact`.<br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>> When a specific format is chosen, I'll start with patches that<br>

>>>>> serialize the<br>

>>>>><br>

>>>>> embedded LLVM IR. Then I'll add support for things like machine<br>

>>>>> functions and<br>

>>>>><br>

>>>>> machine basic blocks, and I think that an intrusive implementation will<br>

>>>>> work best<br>

>>>>><br>

>>>>> for data structures like these. After that I will continue adding<br>

>>>>> support for<br>

>>>>><br>

>>>>> serialization of the remaining data structures.<br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>> Thanks for reading through the proposal. What are you thoughts about<br>

>>>>> this format?<br>

>>>>><br>

>>>>><br>

>>>>> _______________________________________________<br>

>>>>> LLVM Developers mailing list<br>

>>>>> <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

>>>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

>>>>><br>

>>>>><br>

>>>><br>

>>>><br>

>>>> _______________________________________________<br>

>>>> LLVM Developers mailing list<br>

>>>> <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

>>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

>>>><br>

>>><br>

>>><br>

>>> _______________________________________________<br>

>>> LLVM Developers mailing list<br>

>>> <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

>>><br>

>><br>

><br>

> _______________________________________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

><br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</blockquote></div>