[llvm-dev] Regarding fuzzing llvm-ir passes

Mon Jul 19 13:31:21 PDT 2021

A bit of prior work to be aware of:

There's something running under OSSFuzz already.  I'm not super clear on 
what this is, how it works operationally, but definitely something to be 
aware of.

llvm-stress is an in tree tool for generating random IR.  Not sure this 
has been actively maintained at all though.

If you're going to use a coverage guided fuzzer, you want to give some 
thought to your corpus choice.  Will your corpus be IR? Bitcode?  A 
random seed for llvm-stress?  A random buffer replacing llvm-stress' 
RNG?  Each has tradeoffs and will exercise different parts of the 
infrastructure.

It's also worth commenting that bugpoint's reduction strategy tends to 
be a very effective mutation fuzzer in practice.

Personally, I'd approach it with something like the following:

  * Start with a corpus of random seeds to llvm-stress + a pass
    identifier.  Should be easy to stand up and run with any fuzz
    driver, make sure it works and fix the obvious problems to get a
    reasonable fuzz rate.
  * Then extend your llvm-stress seed corpus into a random buffer
    corpus.  Extract llvm-stress into a library which consumes a string
    of random bytes.  Have the first byte of the buffer map to pass
    under test and the rest of an llvm-stress input.
  * Once that was running successfully - extend it.  There's lots of
    room to improve llvm-stress' generator.
  * Another extension would be to add in mutation transforms after
    generation but before pass of interest.  (Extracting out
    bugpoint/llvm-bisect transforms to use for the mutation would work
    pretty well.)  Basically, you extend your input buffer to allow a
    set of transform identifies following the buffer passed to llvm-stress.

The preceding is not super well thought out, just what occurred to me in 
the moment.

Philip

On 7/19/21 12:12 PM, David Blaikie via llvm-dev wrote:
> Seems viable (+Kostya, maybe he can +anyone else on his team/he's 
> worked with who might be interesting in collaborating on this use of 
> fuzzing, or provide other general pointers, etc)
>
> On Mon, Jul 19, 2021 at 12:06 PM Saurabh Jha via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Hi llvm people,
>
>     I have been contributing to clang for a while. I am now looking
>     for something to work on in llvm-core.
>
>     In the list of open projects, I found llvm IR fuzzing
>     <https://llvm.org/OpenProjects.html#llvm_ir_fuzzing> to be
>     interesting. I saw the gsoc page
>     <https://summerofcode.withgoogle.com/organizations/5767011616948224/?sp-page=2>
>     for llvm and browsed through the mailing list and it seems to me
>     that no one else is actively working on it at the moment.
>
>     Is anyone else working on it right now? I am planning to start on
>     the prerequisite readings once I get a better view on what's going
>     on in this area or whether I should pursue something else.
>
>     Many thanks,
>     Saurabh
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210719/2bbbbaf1/attachment.html>