<p dir="ltr">There's no reason to rewrite the IR parser. </p>
<p dir="ltr">-eric</p>
<br><div class="gmail_quote">On Tue, Apr 28, 2015, 10:39 PM Hayden Livingston <<a href="mailto:halivingston@gmail.com">halivingston@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">As an aside, you haven't mentioned but will the IR parser be rewritten<br>
at all? Is the YAML a container on top of the IR?<br>
<br>
If you are rewriting the IR parser, would it be possible to maintain<br>
some sort of grammar?<br>
<br>
On Tue, Apr 28, 2015 at 5:59 PM, David Majnemer<br>
<<a href="mailto:david.majnemer@gmail.com" target="_blank">david.majnemer@gmail.com</a>> wrote:<br>
><br>
><br>
> On Tuesday, April 28, 2015, Sean Silva <<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>> wrote:<br>
>><br>
>><br>
>><br>
>> On Tue, Apr 28, 2015 at 3:51 PM, David Majnemer <<a href="mailto:david.majnemer@gmail.com" target="_blank">david.majnemer@gmail.com</a>><br>
>> wrote:<br>
>>><br>
>>> I love the idea of having some sort of textual representation. My only<br>
>>> concern is that our YAML parser is not very actively maintained (is there<br>
>>> someone expert with its implementation *and* active in the project?) and<br>
>>> (IMHO) over-engineered when compared to the simplicity of our custom IR<br>
>>> parser.<br>
>>><br>
>>> Without TLC, I'm afraid it would make for a poor piece of LLVM<br>
>>> infrastructure to rely on. The reliability of the serialization mechanism<br>
>>> is very important if we are to have any chance of applying fuzz testing to<br>
>>> the backend pieces; after all, testability is a huge motivation for this<br>
>>> work.<br>
>>><br>
>>> As a concrete example, a file solely containing '%' crashes the yaml<br>
>>> parser:<br>
>>> $ ~/llvm/Debug+Asserts/bin/yaml2obj -format=coff t.yaml<br>
>>> yaml2obj: ~/llvm/src/lib/Support/YAMLTraits.cpp:78: bool<br>
>>> llvm::yaml::Input::setCurrentDocument(): Assertion `Strm->failed() && "Root<br>
>>> is NULL iff parsing failed"' failed.<br>
>>> 0 yaml2obj 0x000000000048682e<br>
>>> 1 yaml2obj 0x0000000000486b43<br>
>>> 2 yaml2obj 0x000000000048570e<br>
>>> 3 libpthread.so.0 0x00007f5e79643340<br>
>>> 4 libc.so.6 0x00007f5e78c9acc9 gsignal + 57<br>
>>> 5 libc.so.6 0x00007f5e78c9e0d8 abort + 328<br>
>>> 6 libc.so.6 0x00007f5e78c93b86<br>
>>> 7 libc.so.6 0x00007f5e78c93c32<br>
>>> 8 yaml2obj 0x000000000045f378<br>
>>> 9 yaml2obj 0x000000000040d4b3<br>
>>> 10 yaml2obj 0x000000000040b0fa<br>
>>> 11 yaml2obj 0x0000000000404a79<br>
>>> 12 yaml2obj 0x0000000000404dd8<br>
>>> 13 libc.so.6 0x00007f5e78c85ec5 __libc_start_main + 245<br>
>>> 14 yaml2obj 0x0000000000404879<br>
>>> Stack dump:<br>
>>> 0. Program arguments: ~/llvm/Debug+Asserts/bin/yaml2obj -format=coff<br>
>>> t.yaml<br>
>>><br>
>><br>
>><br>
>> Hopefully a fuzzer that is fuzzing a yaml input would not waste its time<br>
>> with syntactically invalid or unusual YAML.<br>
><br>
><br>
> Maybe. I don't see why we would want to lock ourselves out of using<br>
> afl-fuzz though.<br>
><br>
>><br>
>><br>
>> Also, you're thinking of YAMLIO which is a layer on top of the YAML parser<br>
>> (YAMLParser.{h,cpp}). It might make sense to not use YAMLIO (it is good for<br>
>> some types of data, not for all) but still use the YAML parser.<br>
>><br>
>> -- Sean Silva<br>
>><br>
>>><br>
>>> On Tue, Apr 28, 2015 at 2:00 PM, Alex L <<a href="mailto:arphaman@gmail.com" target="_blank">arphaman@gmail.com</a>> wrote:<br>
>>>><br>
>>>><br>
>>>><br>
>>>> 2015-04-28 10:14 GMT-07:00 Quentin Colombet <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>>:<br>
>>>>><br>
>>>>> Hi Alex,<br>
>>>>><br>
>>>>> Thanks for working on this.<br>
>>>>><br>
>>>>> Personally I would rather not have to write YAML inputs but instead<br>
>>>>> resort on the what the machine dumps look like. That being said, I can live<br>
>>>>> with YAML :).<br>
>>>>><br>
>>>>> More importantly, how do you plan to report syntax errors to the users?<br>
>>>>> Things like invalid instruction, invalid registers, etc.?<br>
>>>>> What about unallocated code, i.e., virtual registers, invalid SSA form,<br>
>>>>> etc.?<br>
>>>>><br>
>>>>> Cheers,<br>
>>>>> Q.<br>
>>>><br>
>>>><br>
>>>> Thanks,<br>
>>>><br>
>>>> Unfortunately, the machine dumps are quite incomplete (and tricky to<br>
>>>> parse too!), and thus some sort of new syntax has to be developed.<br>
>>>> I think that a YAML based container is a good candidate for this<br>
>>>> purpose, as it has a structured format that represents things like machine<br>
>>>> functions,<br>
>>>> frame information, register information, target specific machine<br>
>>>> function details, etc in a clear and readable way.<br>
>>>><br>
>>>> I haven't thought about error reporting that much, as I've been mostly<br>
>>>> working on developing the syntax and making sure that all the data<br>
>>>> structures<br>
>>>> can be represented by it. But I believe that the errors that crop up in<br>
>>>> an invalid machine instruction syntax, like invalid basic block references,<br>
>>>> invalid instructions,<br>
>>>> etc. can be reported quite well and I can rely on already existing error<br>
>>>> reporting facilities in LLVM to help me. The more structural errors, like<br>
>>>> missing attributes<br>
>>>> will be handled by the YAML parser automatically, and I might extend it<br>
>>>> to provide better/more specific error messages. And I think that it's<br>
>>>> possible<br>
>>>> to use the machine verifier to catch the other errors that you've<br>
>>>> mentioned.<br>
>>>><br>
>>>> Alex<br>
>>>><br>
>>>><br>
>>>>><br>
>>>>> On Apr 28, 2015, at 9:56 AM, Alex L <<a href="mailto:arphaman@gmail.com" target="_blank">arphaman@gmail.com</a>> wrote:<br>
>>>>><br>
>>>>> Hi all,<br>
>>>>><br>
>>>>><br>
>>>>> I would like to propose a text-based, human readable format that will<br>
>>>>> be used to<br>
>>>>><br>
>>>>> serialize the machine level IR. The major goal of this format is to<br>
>>>>> allow LLVM<br>
>>>>><br>
>>>>> to save the machine level IR after any code generation pass and then to<br>
>>>>> load<br>
>>>>><br>
>>>>> it again and continue running passes on the machine level IR. The<br>
>>>>> primary use case<br>
>>>>><br>
>>>>> of this format is to enable easier testing process for the code<br>
>>>>> generation passes,<br>
>>>>><br>
>>>>> by allowing the developers to write tests that load the IR, then invoke<br>
>>>>> just a<br>
>>>>><br>
>>>>> specific code gen pass and then inspect the output of that pass by<br>
>>>>> checking the<br>
>>>>><br>
>>>>> printed out IR.<br>
>>>>><br>
>>>>><br>
>>>>><br>
>>>>> The proposed format has a number of key features:<br>
>>>>><br>
>>>>> - It stores the machine level IR and the optional LLVM IR in one text<br>
>>>>> file.<br>
>>>>><br>
>>>>> - The connections between the machine level IR and the LLVM IR are<br>
>>>>> preserved.<br>
>>>>><br>
>>>>> - The format uses a YAML based container for most of the data<br>
>>>>> structures. The LLVM<br>
>>>>><br>
>>>>> IR is embedded in the YAML container.<br>
>>>>><br>
>>>>> - The format also uses a new, text-based syntax to serialize the<br>
>>>>> machine instructions.<br>
>>>>><br>
>>>>> The instructions are embedded in YAML.<br>
>>>>><br>
>>>>><br>
>>>>> This is an incomplete example of a YAML file containing the LLVM IR,<br>
>>>>> the machine level IR<br>
>>>>><br>
>>>>> and the instructions:<br>
>>>>><br>
>>>>><br>
>>>>> ---<br>
>>>>><br>
>>>>> ir: |<br>
>>>>><br>
>>>>> define i32 @fact(i32 %n) {<br>
>>>>><br>
>>>>> %1 = alloca i32, align 4<br>
>>>>><br>
>>>>> store i32 %n, i32* %1, align 4<br>
>>>>><br>
>>>>> %2 = load i32, i32* %1, align 4<br>
>>>>><br>
>>>>> %3 = icmp eq i32 %2, 0<br>
>>>>><br>
>>>>> br i1 %3, label %10, label %4<br>
>>>>><br>
>>>>><br>
>>>>> ; <label>:4 ; preds = %0<br>
>>>>><br>
>>>>> %5 = load i32, i32* %1, align 4<br>
>>>>><br>
>>>>> %6 = sub nsw i32 %5, 1<br>
>>>>><br>
>>>>> %7 = call i32 @fact(i32 %6)<br>
>>>>><br>
>>>>> %8 = load i32, i32* %1, align 4<br>
>>>>><br>
>>>>> %9 = mul nsw i32 %7, %8<br>
>>>>><br>
>>>>> br label %10<br>
>>>>><br>
>>>>><br>
>>>>> ; <label>:10 ; preds = %0, %4<br>
>>>>><br>
>>>>> %11 = phi i32 [ %9, %4 ], [ 1, %0 ]<br>
>>>>><br>
>>>>> ret i32 %11<br>
>>>>><br>
>>>>> }<br>
>>>>><br>
>>>>><br>
>>>>> ...<br>
>>>>><br>
>>>>> ---<br>
>>>>><br>
>>>>> number: 0<br>
>>>>><br>
>>>>> name: fact<br>
>>>>><br>
>>>>> alignment: 4<br>
>>>>><br>
>>>>> regInfo:<br>
>>>>><br>
>>>>> ....<br>
>>>>><br>
>>>>> frameInfo:<br>
>>>>><br>
>>>>> ....<br>
>>>>><br>
>>>>> body:<br>
>>>>><br>
>>>>> - bb: 0<br>
>>>>><br>
>>>>> llbb: '%0'<br>
>>>>><br>
>>>>> successors: [ 'bb#2', 'bb#1' ]<br>
>>>>><br>
>>>>> liveIns: [ '%edi' ]<br>
>>>>><br>
>>>>> instructions:<br>
>>>>><br>
>>>>> - 'push64r undef %rax, %rsp, %rsp'<br>
>>>>><br>
>>>>> - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'<br>
>>>>><br>
>>>>> - ....<br>
>>>>><br>
>>>>> ....<br>
>>>>><br>
>>>>> - bb: 1<br>
>>>>><br>
>>>>> llbb: '%4'<br>
>>>>><br>
>>>>> successors: [ 'bb#2' ]<br>
>>>>><br>
>>>>> instructions:<br>
>>>>><br>
>>>>> - '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg'<br>
>>>>><br>
>>>>> - ....<br>
>>>>><br>
>>>>> ....<br>
>>>>><br>
>>>>> - ....<br>
>>>>><br>
>>>>> ....<br>
>>>>><br>
>>>>> ...<br>
>>>>><br>
>>>>><br>
>>>>> The example above shows a YAML file with two YAML documents (delimited<br>
>>>>> by `---`<br>
>>>>><br>
>>>>> and `...`) containing the LLVM IR and the machine function information<br>
>>>>> for the function `fact`.<br>
>>>>><br>
>>>>><br>
>>>>><br>
>>>>> When a specific format is chosen, I'll start with patches that<br>
>>>>> serialize the<br>
>>>>><br>
>>>>> embedded LLVM IR. Then I'll add support for things like machine<br>
>>>>> functions and<br>
>>>>><br>
>>>>> machine basic blocks, and I think that an intrusive implementation will<br>
>>>>> work best<br>
>>>>><br>
>>>>> for data structures like these. After that I will continue adding<br>
>>>>> support for<br>
>>>>><br>
>>>>> serialization of the remaining data structures.<br>
>>>>><br>
>>>>><br>
>>>>><br>
>>>>> Thanks for reading through the proposal. What are you thoughts about<br>
>>>>> this format?<br>
>>>>><br>
>>>>><br>
>>>>> _______________________________________________<br>
>>>>> LLVM Developers mailing list<br>
>>>>> <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
>>>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
>>>>><br>
>>>>><br>
>>>><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> LLVM Developers mailing list<br>
>>>> <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
>>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
>>>><br>
>>><br>
>>><br>
>>> _______________________________________________<br>
>>> LLVM Developers mailing list<br>
>>> <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
>>><br>
>><br>
><br>
> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</blockquote></div>