[llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

Johannes Doerfert via llvm-dev llvm-dev at lists.llvm.org
Wed Mar 10 08:53:42 PST 2021


Hi Chibin,

On 3/9/21 3:12 AM, 张驰斌 wrote:
> Hi Johannes,
>         Glad to hear from you! I understand that the title listed in the llvm GSoC 2021 webpage serves as a general guideline but a project proposal might need limit its scope and focus on the deliverables.

Yes, students will write the actual proposal which should contain more 
details and scope discussion.


>   The ideas proposed all seems quite appealing and relevant to me. I’ve been browsing through llvm.rog/docs/FuzzingLLVM.html and llvm-project/*/tools/*-fuzzer recently as well as the youtube video that you mentioned on the GSoC site. The following are some questions I’ve accumulated. (forgive me if they are too naïve…).

Questions are always good.


> 1.      Truth to be told, I’ve used OpenMP before for my course project, but I haven’t look into the inner workings of it, e.g. how it actually instruments programs decorated with #pragma, and how it interact with the OS’s threading. If llvm’s OpenMP implementation hasn’t been fuzzed before, then it surely is a valuable fuzz target.  Could you give some clue on how we could fuzz OpenMP?  Like writing a parser for fuzzer input and calling openmp library function in LLVMFuzzOneInput function? Or we fuzz it through clang? I’ll look into llvm-project/openmp some more.

So the OpenMP runtime has an "internal" and an external part. The 
internal part is full of undocumented dependences so I doubt we can fuzz 
it without breaking at least one for each test. The external one is 
fuzzable however. That said, generating OpenMP programs to be feed to 
clang seems like a good thing to do. OpenMP has it's own set of 
"documented" dependences, e.g., nesting restrictions, but that is not 
necessarily a problem.
If we generate an invalid OpenMP program we should gracefully fail, in 
most cases. If we don't we have good test cases for an OpenMP sanitizer 
later on. We could also embed knowledge about nesting and other OpenMP 
restrictions into the fuzzer/mutation tester/test generator. Long story 
short, generating a large corpus of OpenMP inputs is certainly something 
I'm interested in, we can start with "random" programs and evolve 
towards more targeted approaches.


> 2.      For the custom mutator idea.  My understanding is that currently there are 2 kinds of mutators, the generic one that is shipped with LibFuzzer (Bit flipping, splicing, etc.), and a structural mutator. Is the structural mutator related to IRMutator.cpp in the FuzzMutate folder?

I'm not sure myself. I think "structural" here means it fuzzes a well 
defined structures, here protobuf. I might be wrong.

What I was looking for, among other things, is a way to do CFG 
transformations and less obvious IR transformations, maybe:
   - Add a "while-loop" with one iteration around a (set of) block(s) 
(various ways to "hide" the one iteration part)
   - Add a "do-loop" with zero iterations around a (set of) block(s) 
(various ways to "hide" the zero iterations part)
   - Add a call to an function SCC which does effectively nothing but 
writes new buffers passed to it or allocated within.
   - Add branches that will not be taken with various targets, 
unreachable, some arbitrary block in the function, etc.
   - Add arguments to functions that are effectively useless.
   - ...

We would do those and record if and how the change impacted passes or 
the entire O3 pipeline. Learn about our heuristics and cutoffs and such, 
build a database, etc.


>
> 3.      Most of the bugs found by fuzzers are usually crashes or hangs. Correctness testing is interesting but hard to achieve from my limited knowledge. I wonder if this is related to the ‘Alive’ tool mentioned by Florian? The fuzzer provides input to some llvm pass, and ‘Alive’ will verify that the transformation is valid. Please correct me if my understanding is wrong…

Yes, that is the idea. If we fuzz blindly, as opposed to guided test 
mutation or synthesis, we will generate a lot of garbage inputs which 
can only be used to detect crashes and hangs. However, given Alive we 
can verify if the output of the compiler is an implementation of the 
input, for some cases.


> To be honest, previous llvm passes I wrote are out tree passes. I’ve just setuped my machine, built llvm configured with fuzzer support, and started fiddling around lately. I have a rough picture of what each idea is about, but it would take some preparation work for me to split them into incremental steps and deliverables. Since it’s still early in the application process, I wonder if you can spare me some time researching the ideas that you proposed and making inquiries before finally deciding on my project proposal? 😊

As mentioned, students write the proposal. You should determine which of 
the "areas" you like best and then do some research towards that. We can 
be in contact and you start write up what you want to do.


>
> I am living in Shanghai, in the GMT+8 time zone. How about 15:00 tommorrow (March. 10), or 13:30 on Friday afternoon (March. 12)? I am not sure which time zone you are located in, so feel free to propose another time slot if the prior two are not convenient for you (later that day or on weekends are both fine). Hope to have a chat with you soon.

This week is full, I'll get back to you.

~ Johannes


>
> Cheers,
> Chibin Zhang
> 2021.3.9
>
> 发件人: Johannes Doerfert<mailto:johannesdoerfert at gmail.com>
> 发送时间: 2021年3月9日 7:17
> 收件人: Florian Hahn<mailto:florian_hahn at apple.com>; llvm-dev<mailto:llvm-dev at lists.llvm.org>
> 抄送: 张驰斌<mailto:zhangchb1 at shanghaitech.edu.cn>; John Regehr<mailto:regehr at cs.utah.edu>
> 主题: Re: [llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)
>
> Having Alive2 as oracle would certainly be great.
>
> Some rough ideas that can be worked on in parallel if we have multiple
> GSoC students:
>    - mutation rules we know are sound, e.g., remove guarantees, add 1
> iteration loops, etc.
>    - input generation, equivalence checking (alive, partial evaluation, ...)
>    - fragment extraction from larger codes + input tracking ->
> reproducer splitting, faster equivalence checking, ...
>
> We certainly can come up with more things.
>
> Would either or both of your (or anyone else) be interested in
> co-mentoring students?
> We have multiple interested ones already, even though my project
> description is lacking any detail.
>
> ~ Johannes
>
>
> On 3/8/21 3:34 PM, Florian Hahn wrote:
>>> On Mar 8, 2021, at 20:26, John Regehr via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>
>>> Hi folks, an angle related to IR fuzzing that I would be happy to help out with is using Alive2 as a test oracle.
>>>
>>> Using Alive2 incurs a set of problems (not all IR features supported, can be very slow) but has corresponding advantages (considers all inputs at once, handles UB gracefully).
>>>
>> If anyone’s interested in combing LLVM’s libFuzzer & Alive2, I’ve put up https://reviews.llvm.org/D96654 which uses Alive2 to verify candidates generated by fuzzing. It works out quite well, but I think there’s lots of potential to improve the ‘interestingness’ of the IR generated by libFuzzer.
>>
>> Cheers,
>> Florian
>>
>


More information about the llvm-dev mailing list