[PATCH] D30156: llvm-mc-fuzzer: add support for assembly

Tue Feb 21 12:54:31 PST 2017

bcain added inline comments.

================
Comment at: tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp:244
   if (Action == AC_Assemble)
-    errs() << "error: -assemble is not implemented\n";
+    return AssembleOneInput(Data, Size);
   else if (Action == AC_Disassemble)
----------------
kcc wrote:
> dsanders wrote:
> > kcc wrote:
> > > bcain wrote:
> > > > dsanders wrote:
> > > > > kcc wrote:
> > > > > > I strongly suggest to make this a separate fuzz target instead of using flags. 
> > > > > > Otherwise it'll be harder to automate running this target. 
> > > > > I'm not sure what you mean here. What difficulties are you thinking of?
> > > > > 
> > > > > FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.
> > > > > I strongly suggest to make this a separate fuzz target instead of using flags. 
> > > > 
> > > > I've preserved the original design for llvm-mc-fuzzer, apparently to imitate llvm-mc.
> > > > 
> > > > Pros/cons of the current design:
> > > > - pro: matches llvm-mc
> > > > - pro: changing focus to probe different paths only requires different command line args
> > > > - con: reproducing fuzzer configuration more difficult because it depends on those args
> > > > - con: libFuzzer might see the uncovered feature set as a goal for coverage (that we already know statically it cannot cover).
> > > > 
> > > > For that last one, it's speculation on my part.
> > > > 
> > > > Kostya, would you be satisfied with this as-is or should I decompose it into two fuzzers?  "Harder to automate" consists of "I must make sure that I can deliver the right command line args to the automation feature"?  Or "won't fit well in oss-fuzz" or something else?
> > > > I'm not sure what you mean here. What difficulties are you thinking of?
> > > 
> > > Imagine an automated system that runs continuous fuzzing (e.g. https://github.com/google/oss-fuzz).
> > > How are you going to tell it to run the same binary with two different flags and to treat those
> > > as two independent entities?
> > > Of course, it's possible to implement support for something like this, but OSS-Fuzz does not and will not support it. 
> > > (because of KISS: https://en.wikipedia.org/wiki/KISS_principle)
> > > 
> > > When analyzing the code coverage (manually, or automatically) there will be a huge lump of code that is never reached in one mode, i.e. this 2-in-1 bundle will confuse the analysis. 
> > > 
> > > Finally, at least in libFuzzer, part of the algorithm is linear by the size of the binary (more precisely: number of instrumented blocks) and so this bundled fuzzer will just be burning CPUs with no reason. 
> > > 
> > > 
> > > > FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.
> > > Yes, and I objected back then :) 
> > > > I'm not sure what you mean here. What difficulties are you thinking of?
> > > Imagine an automated system that runs continuous fuzzing (e.g.
> > > https://github.com/google/oss-fuzz).
> > > How are you going to tell it to run the same binary with two different flags and to treat those
> > > as two independent entities?
> > 
> > I'm not familiar with oss-fuzz but based on an initial glance through I'm not sure how this is different from oss-fuzz/projects/curl/. That project is using pre-processor macros to select between different fuzzers.
> > 
> > To answer the question though, if I wanted to fuzz everything (assembler/disassembler, all arches, subarches, and feature combinations) in this kind of system and the curl/llvm-mc-fuzzer way had been ruled out. I'd probably use the first few bytes of the data as the configuration and do a full setup/teardown in LLVMFuzzerTestOneInput().
> > 
> > That said, I think that's a different kind of fuzzer to llvm-mc-fuzzer. It would aim to improve the quality of the LLVM project as a whole whereas llvm-mc-fuzzer was meant to help backend developers improve the quality of their particular targets and subtargets.
> > 
> > > Of course, it's possible to implement support for something like this, but OSS-Fuzz does not and
> > > will not support it. 
> > > (because of KISS: https://en.wikipedia.org/wiki/KISS_principle)
> > 
> > This principle is the reason this tool uses command line arguments for the action/triple/arch/subarch/features. Command line arguments were the simplest way to configure a particular target without having to re-compile for each combination. I included support for other archs/subarches/features because it made the original goal easier and also made the tool more useful to others.
> > 
> > > When analyzing the code coverage (manually, or automatically) there will be a huge lump of code
> > > that is never reached in one mode, i.e. this 2-in-1 bundle will confuse the analysis.
> > 
> > FWIW, this is also the case between arches/subarches/features. For example, on an X86 host using default options, the AArch64/ARM/Mips/etc. disassemblers are not tested.
> > 
> > > Finally, at least in libFuzzer, part of the algorithm is linear by the size of the binary (more precisely:
> > > number of instrumented blocks) and so this bundled fuzzer will just be burning CPUs with no
> > > reason.
> > 
> > That's a fair point.
> > 
> > > > FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.
> > > Yes, and I objected back then :)
> > 
> > I remember you objected to having a custom main function that mangled the arguments before passing them on to libFuzzer and I fixed that. I didn't think there was an objection to command line arguments in general though.
> > 
> > If the objection was to command line arguments in general, Is there a way to test an architecture in isolation from the others that's more in keeping with libFuzzer's style?
> > using pre-processor macros to select between different fuzzers.
> 
> That's fine, since it creates different binaries, where unused code has a good chance to not even be linked in. 
> 
> >  I'd probably use the first few bytes of the data as the configuration
> That's an option but it has 2 problems: 
> 1. now the inputs are from some new artificial data format
> 2. fuzzing is less efficient due to larger binary
> 
> > FWIW, this is also the case between arches/subarches/features.
> Yes. 
> 
> > you objected to having a custom main function that mangled the arguments
> 
> Yes, and that is currently not supported at all (was a mistake). 
> 
> > Is there a way to test an architecture in isolation from the others that's more in keeping with libFuzzer's style?
> Have separate binary (build target) for every distinct configuration of code we have in mind. 
> How many are we talking about here? 
> tens or hundreds? 
> Having 20-30 binaries like this is totally fine imho, and works great on oss-fuzz (see e.g. the way we've done it for ffmpeg, where we have ~40 binaries). 
> https://github.com/google/oss-fuzz/blob/master/projects/ffmpeg/build.sh
> 
> 
> Have separate binary (build target) for every distinct configuration of code we have in mind. 

I'd be willing to modify my submission to at least decompose llvm-mc-fuzzer into llvm-mc-assemble-fuzzer and llvm-mc-disassemble-fuzzer.  That would solve the problem of having primarily disjoint content under a single binary.  Is that a reasonable compromise, or is it necessary to have zero command line args?

Repository:
  rL LLVM

https://reviews.llvm.org/D30156