[llvm-dev] [RFC] Implementing LLVM MC Protobuf Fuzzer for Assembly and Encoding for RISC-V target

Daniel Sanders via llvm-dev llvm-dev at lists.llvm.org
Wed Oct 24 18:24:12 PDT 2018


Hi Ana,

> On Oct 24, 2018, at 13:05, apazos at codeaurora.org wrote:
> 
> Hi Daniel,
> 
> Thanks for the feedback.
> 
> That is correct, you can invoke the fuzzers without a golden reference implementation. The driver program to compare behaviors is just a convenient tool for those who have a reference implementation.
> 
> I am not sure I understood your suggestion about de-serializing Protobuf messages as MCInst objects. Can you clarify?

I'm thinking that a single protobuf message could be used to generate the Assembly, MCInst, and Encoding for a given instruction. From that you can:
Parse the generated assembly and check the intermediate MCInst matches the generated MCInst
Disassemble the generated encoding and check the intermediate MCInst matches the generated MCInst
Assemble the generated MCInst and check the encoding matches the generated encoding
Disassemble the generated MCInst and check the assembly matches the generated assembly
in addition to testing the Assembly->Encoding and Encoding->Assembly. With these additional tests, the fuzzer can identify whether it's the parsing, assembling, disassembling, or instruction printing that is incorrect.

For example, given 'addiu $1, $2, 3', 'ADD reg(1), reg(2), imm(3)', and 0x0123 the fuzzer would be able to report something like:
	0x0123 didn't match golden reference (0x0123 != 0x0126)
	Parsing  'addiu $1, $2, 3' produces 'ADD reg(1), reg(2), imm(3)': Parser is ok
	Assembling 'ADD reg(1), reg(2), imm(3)' produces '0x0126': Assembler is wrong
	Disassembling '0x0123' produces 'ADD reg(1), reg(2), imm(3)' produces '0x0124': Disassembler is ok
	Printing 'ADD reg(1), reg(2), imm(3)' produces 'addiu $1, $2, 3': Instruction printer is ok

I'm also thinking that fuzzing the MCInst->Assembly, and MCInst->Encoding paths in isolation would allow us to be sure that the MCInst's that are emitted by CodeGen behave correctly when given to the MC layer. At the moment, we assume that Assembly->MCInst->Encoding and Encoding->MCInst->Assembly being correct implies that CodeGen->MCInst->Assembly and CodeGen->MCInst->Encoding are correct too. However, this doesn't quite hold as there are multiple ways of expressing the same operand in an MCInst. For example:
addExpr(MCConstantExpr::create(2, Ctx))
addImm(2)
addExpr(/* some REL relocation evaluating to Symbol+2 */)
and Assembly->MCInst->Encoding/Encoding->MCInst->Assembly will only exercise one of them.

> Thanks,
> Ana.
> 
> On 2018-10-16 10:34, Daniel Sanders wrote:
>>> On 16 Oct 2018, at 10:09, Daniel Sanders via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>> Hi Ana,
>>> I think this looks interesting although unfortunately I'm not sure I'm going to be able to make use of it for my current target as I don't have a golden reference tool available.
>> Thinking about it a bit more, the lack of a golden reference only
>> really affects my ability to use the driver script. With a
>> different/modified driver I should be able to use the underlying
>> fuzzer without a reference tool available.
>>> One of the key weaknesses of llvm-mc-disassembler-fuzzer for most targets is that it only finds a corpus of tests that improve coverage but doesn't provide any assessment on what the correct behaviour is. A human is required to make proper test cases out of the corpus and feed it back in so the fuzzer can drop the corresponding generated tests. Having a fuzzer that can verify the behaviour as well would be very useful for targets with access to a golden reference tool.
>>> One thing that occurred to me while skimming through D51144 was that something similar to proto_to_asm_main.cpp could be used to generate MCInst objects directly from the same protobuf. This would allow you to attribute bugs to the parser, instruction printer, or object emitter since you'd be able to tell, for example, that the parser emitted the an MCInst that matched the one expected by the protobuf.
>>>> On 15 Oct 2018, at 12:29, via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>> Hello,
>>>> We have implemented LLVM Machine Code Protobuf fuzzers for the RISC-V target as part of a Summer internship project with our intern Jocelyn Wei.
>>>> The fuzzers for the assembler and disassembler proved to be useful. We uncovered bugs and detected compatibility issues with other tools, e.g., by running a driver program that implements a round trip with a golden (i.e., more tested) tool such as GNU AS.
>>>> We built different fuzzer versions to experiment with the level of fuzzing for the instruction operands.
>>>> The versions are labeled sample, semi-constrained, unconstrained. We fix opcodes, and depending on the fuzzer version, allow number of operands, operand value ranges, and operand types to vary.
>>>> The code is available for review:
>>>> https://reviews.llvm.org/D51710 Implemented Protobuf fuzzer for LLVM RISC-V MC Disassembler
>>>> https://reviews.llvm.org/D51144 Implemented Protobuf fuzzer for LLVM RISC-V MC Assembler
>>>> We would like to assess people's interest in adding this type of tool to the LLVM code base.
>>>> It can be further improved for RISC-V target and also expanded to other targets.
>>>> We have a Poster about the fuzzers at the LLVM Dev Conf this week.
>>>> Please visit our poster and come by with your comments and suggestions. We appreciate your feebdack.
>>>> Thank you,
>>>> Ana.
>>>> --
>>>> Ana Pazos
>>>> Qualcomm Innovation Center, Inc.
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>> a Linux Foundation Collaborative Project.
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> -- 
> Ana Pazos
> Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181024/e074c469/attachment.html>


More information about the llvm-dev mailing list