[PATCH] D30156: llvm-mc-fuzzer: add support for assembly

Tue Feb 21 13:21:07 PST 2017

dsanders added inline comments.

================
Comment at: tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp:244
   if (Action == AC_Assemble)
-    errs() << "error: -assemble is not implemented\n";
+    return AssembleOneInput(Data, Size);
   else if (Action == AC_Disassemble)
----------------
bcain wrote:
> kcc wrote:
> > dsanders wrote:
> > > kcc wrote:
> > > > bcain wrote:
> > > > > dsanders wrote:
> > > > > > kcc wrote:
> > > > > > > I strongly suggest to make this a separate fuzz target instead of using flags. 
> > > > > > > Otherwise it'll be harder to automate running this target. 
> > > > > > I'm not sure what you mean here. What difficulties are you thinking of?
> > > > > > 
> > > > > > FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.
> > > > > > I strongly suggest to make this a separate fuzz target instead of using flags. 
> > > > > 
> > > > > I've preserved the original design for llvm-mc-fuzzer, apparently to imitate llvm-mc.
> > > > > 
> > > > > Pros/cons of the current design:
> > > > > - pro: matches llvm-mc
> > > > > - pro: changing focus to probe different paths only requires different command line args
> > > > > - con: reproducing fuzzer configuration more difficult because it depends on those args
> > > > > - con: libFuzzer might see the uncovered feature set as a goal for coverage (that we already know statically it cannot cover).
> > > > > 
> > > > > For that last one, it's speculation on my part.
> > > > > 
> > > > > Kostya, would you be satisfied with this as-is or should I decompose it into two fuzzers?  "Harder to automate" consists of "I must make sure that I can deliver the right command line args to the automation feature"?  Or "won't fit well in oss-fuzz" or something else?
> > > > > I'm not sure what you mean here. What difficulties are you thinking of?
> > > > 
> > > > Imagine an automated system that runs continuous fuzzing (e.g. https://github.com/google/oss-fuzz).
> > > > How are you going to tell it to run the same binary with two different flags and to treat those
> > > > as two independent entities?
> > > > Of course, it's possible to implement support for something like this, but OSS-Fuzz does not and will not support it. 
> > > > (because of KISS: https://en.wikipedia.org/wiki/KISS_principle)
> > > > 
> > > > When analyzing the code coverage (manually, or automatically) there will be a huge lump of code that is never reached in one mode, i.e. this 2-in-1 bundle will confuse the analysis. 
> > > > 
> > > > Finally, at least in libFuzzer, part of the algorithm is linear by the size of the binary (more precisely: number of instrumented blocks) and so this bundled fuzzer will just be burning CPUs with no reason. 
> > > > 
> > > > 
> > > > > FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.
> > > > Yes, and I objected back then :) 
> > > > > I'm not sure what you mean here. What difficulties are you thinking of?
> > > > Imagine an automated system that runs continuous fuzzing (e.g.
> > > > https://github.com/google/oss-fuzz).
> > > > How are you going to tell it to run the same binary with two different flags and to treat those
> > > > as two independent entities?
> > > 
> > > I'm not familiar with oss-fuzz but based on an initial glance through I'm not sure how this is different from oss-fuzz/projects/curl/. That project is using pre-processor macros to select between different fuzzers.
> > > 
> > > To answer the question though, if I wanted to fuzz everything (assembler/disassembler, all arches, subarches, and feature combinations) in this kind of system and the curl/llvm-mc-fuzzer way had been ruled out. I'd probably use the first few bytes of the data as the configuration and do a full setup/teardown in LLVMFuzzerTestOneInput().
> > > 
> > > That said, I think that's a different kind of fuzzer to llvm-mc-fuzzer. It would aim to improve the quality of the LLVM project as a whole whereas llvm-mc-fuzzer was meant to help backend developers improve the quality of their particular targets and subtargets.
> > > 
> > > > Of course, it's possible to implement support for something like this, but OSS-Fuzz does not and
> > > > will not support it. 
> > > > (because of KISS: https://en.wikipedia.org/wiki/KISS_principle)
> > > 
> > > This principle is the reason this tool uses command line arguments for the action/triple/arch/subarch/features. Command line arguments were the simplest way to configure a particular target without having to re-compile for each combination. I included support for other archs/subarches/features because it made the original goal easier and also made the tool more useful to others.
> > > 
> > > > When analyzing the code coverage (manually, or automatically) there will be a huge lump of code
> > > > that is never reached in one mode, i.e. this 2-in-1 bundle will confuse the analysis.
> > > 
> > > FWIW, this is also the case between arches/subarches/features. For example, on an X86 host using default options, the AArch64/ARM/Mips/etc. disassemblers are not tested.
> > > 
> > > > Finally, at least in libFuzzer, part of the algorithm is linear by the size of the binary (more precisely:
> > > > number of instrumented blocks) and so this bundled fuzzer will just be burning CPUs with no
> > > > reason.
> > > 
> > > That's a fair point.
> > > 
> > > > > FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.
> > > > Yes, and I objected back then :)
> > > 
> > > I remember you objected to having a custom main function that mangled the arguments before passing them on to libFuzzer and I fixed that. I didn't think there was an objection to command line arguments in general though.
> > > 
> > > If the objection was to command line arguments in general, Is there a way to test an architecture in isolation from the others that's more in keeping with libFuzzer's style?
> > > using pre-processor macros to select between different fuzzers.
> > 
> > That's fine, since it creates different binaries, where unused code has a good chance to not even be linked in. 
> > 
> > >  I'd probably use the first few bytes of the data as the configuration
> > That's an option but it has 2 problems: 
> > 1. now the inputs are from some new artificial data format
> > 2. fuzzing is less efficient due to larger binary
> > 
> > > FWIW, this is also the case between arches/subarches/features.
> > Yes. 
> > 
> > > you objected to having a custom main function that mangled the arguments
> > 
> > Yes, and that is currently not supported at all (was a mistake). 
> > 
> > > Is there a way to test an architecture in isolation from the others that's more in keeping with libFuzzer's style?
> > Have separate binary (build target) for every distinct configuration of code we have in mind. 
> > How many are we talking about here? 
> > tens or hundreds? 
> > Having 20-30 binaries like this is totally fine imho, and works great on oss-fuzz (see e.g. the way we've done it for ffmpeg, where we have ~40 binaries). 
> > https://github.com/google/oss-fuzz/blob/master/projects/ffmpeg/build.sh
> > 
> > 
> > Have separate binary (build target) for every distinct configuration of code we have in mind. 
> 
> I'd be willing to modify my submission to at least decompose llvm-mc-fuzzer into llvm-mc-assemble-fuzzer and llvm-mc-disassemble-fuzzer.  That would solve the problem of having primarily disjoint content under a single binary.  Is that a reasonable compromise, or is it necessary to have zero command line args?
> Have separate binary (build target) for every distinct configuration of code we have in mind. 
> How many are we talking about here? tens or hundreds? 
> Having 20-30 binaries like this is totally fine imho, and works great on oss-fuzz (see e.g. the way we've done it for ffmpeg, where we have ~40 binaries). 
> https://github.com/google/oss-fuzz/blob/master/projects/ffmpeg/build.sh

It depends how far you go with splitting it up. If you split binaries at the arch level then there's 15 in-tree targets (I didn't check how many have an MC layer). Including subarches it's probably hundreds (Mips has ~20 I can think of off-hand), and including features it's likely to push towards thousands.

You only save binary size by splitting at the arch level though since LLVM doesn't have a means to limit support to subarches and smaller.

> I'd be willing to modify my submission to at least decompose llvm-mc-fuzzer into
> llvm-mc-assemble-fuzzer and llvm-mc-disassemble-fuzzer. That would solve the
> problem of having primarily disjoint content under a single binary. Is that a reasonable
> compromise, or is it necessary to have zero command line args?

That sounds reasonable to me.

Repository:
  rL LLVM

https://reviews.llvm.org/D30156