[llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

Tue Mar 9 01:12:51 PST 2021

Hi Johannes,
       Glad to hear from you! I understand that the title listed in the llvm GSoC 2021 webpage serves as a general guideline but a project proposal might need limit its scope and focus on the deliverables. The ideas proposed all seems quite appealing and relevant to me. I’ve been browsing through llvm.rog/docs/FuzzingLLVM.html and llvm-project/*/tools/*-fuzzer recently as well as the youtube video that you mentioned on the GSoC site. The following are some questions I’ve accumulated. (forgive me if they are too naïve…).

1.      Truth to be told, I’ve used OpenMP before for my course project, but I haven’t look into the inner workings of it, e.g. how it actually instruments programs decorated with #pragma, and how it interact with the OS’s threading. If llvm’s OpenMP implementation hasn’t been fuzzed before, then it surely is a valuable fuzz target.  Could you give some clue on how we could fuzz OpenMP?  Like writing a parser for fuzzer input and calling openmp library function in LLVMFuzzOneInput function? Or we fuzz it through clang? I’ll look into llvm-project/openmp some more.

2.      For the custom mutator idea.  My understanding is that currently there are 2 kinds of mutators, the generic one that is shipped with LibFuzzer (Bit flipping, splicing, etc.), and a structural mutator. Is the structural mutator related to IRMutator.cpp in the FuzzMutate folder?

3.      Most of the bugs found by fuzzers are usually crashes or hangs. Correctness testing is interesting but hard to achieve from my limited knowledge. I wonder if this is related to the ‘Alive’ tool mentioned by Florian? The fuzzer provides input to some llvm pass, and ‘Alive’ will verify that the transformation is valid. Please correct me if my understanding is wrong…
To be honest, previous llvm passes I wrote are out tree passes. I’ve just setuped my machine, built llvm configured with fuzzer support, and started fiddling around lately. I have a rough picture of what each idea is about, but it would take some preparation work for me to split them into incremental steps and deliverables. Since it’s still early in the application process, I wonder if you can spare me some time researching the ideas that you proposed and making inquiries before finally deciding on my project proposal? 😊

I am living in Shanghai, in the GMT+8 time zone. How about 15:00 tommorrow (March. 10), or 13:30 on Friday afternoon (March. 12)? I am not sure which time zone you are located in, so feel free to propose another time slot if the prior two are not convenient for you (later that day or on weekends are both fine). Hope to have a chat with you soon.

Cheers,
Chibin Zhang
2021.3.9

发件人: Johannes Doerfert<mailto:johannesdoerfert at gmail.com>
发送时间: 2021年3月9日 7:17
收件人: Florian Hahn<mailto:florian_hahn at apple.com>; llvm-dev<mailto:llvm-dev at lists.llvm.org>
抄送: 张驰斌<mailto:zhangchb1 at shanghaitech.edu.cn>; John Regehr<mailto:regehr at cs.utah.edu>
主题: Re: [llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

Having Alive2 as oracle would certainly be great.

Some rough ideas that can be worked on in parallel if we have multiple
GSoC students:
  - mutation rules we know are sound, e.g., remove guarantees, add 1
iteration loops, etc.
  - input generation, equivalence checking (alive, partial evaluation, ...)
  - fragment extraction from larger codes + input tracking ->
reproducer splitting, faster equivalence checking, ...

We certainly can come up with more things.

Would either or both of your (or anyone else) be interested in
co-mentoring students?
We have multiple interested ones already, even though my project
description is lacking any detail.

~ Johannes

On 3/8/21 3:34 PM, Florian Hahn wrote:
>
>> On Mar 8, 2021, at 20:26, John Regehr via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>
>> Hi folks, an angle related to IR fuzzing that I would be happy to help out with is using Alive2 as a test oracle.
>>
>> Using Alive2 incurs a set of problems (not all IR features supported, can be very slow) but has corresponding advantages (considers all inputs at once, handles UB gracefully).
>>
> If anyone’s interested in combing LLVM’s libFuzzer & Alive2, I’ve put up https://reviews.llvm.org/D96654 which uses Alive2 to verify candidates generated by fuzzing. It works out quite well, but I think there’s lots of potential to improve the ‘interestingness’ of the IR generated by libFuzzer.
>
> Cheers,
> Florian
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210309/3fbffc57/attachment-0001.html>