[PATCH] D30156: llvm-mc-fuzzer: add support for assembly

Tue Feb 21 12:26:57 PST 2017

kcc added inline comments.

================
Comment at: tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp:244
   if (Action == AC_Assemble)
-    errs() << "error: -assemble is not implemented\n";
+    return AssembleOneInput(Data, Size);
   else if (Action == AC_Disassemble)
----------------
dsanders wrote:
> kcc wrote:
> > bcain wrote:
> > > dsanders wrote:
> > > > kcc wrote:
> > > > > I strongly suggest to make this a separate fuzz target instead of using flags. 
> > > > > Otherwise it'll be harder to automate running this target. 
> > > > I'm not sure what you mean here. What difficulties are you thinking of?
> > > > 
> > > > FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.
> > > > I strongly suggest to make this a separate fuzz target instead of using flags. 
> > > 
> > > I've preserved the original design for llvm-mc-fuzzer, apparently to imitate llvm-mc.
> > > 
> > > Pros/cons of the current design:
> > > - pro: matches llvm-mc
> > > - pro: changing focus to probe different paths only requires different command line args
> > > - con: reproducing fuzzer configuration more difficult because it depends on those args
> > > - con: libFuzzer might see the uncovered feature set as a goal for coverage (that we already know statically it cannot cover).
> > > 
> > > For that last one, it's speculation on my part.
> > > 
> > > Kostya, would you be satisfied with this as-is or should I decompose it into two fuzzers?  "Harder to automate" consists of "I must make sure that I can deliver the right command line args to the automation feature"?  Or "won't fit well in oss-fuzz" or something else?
> > > I'm not sure what you mean here. What difficulties are you thinking of?
> > 
> > Imagine an automated system that runs continuous fuzzing (e.g. https://github.com/google/oss-fuzz).
> > How are you going to tell it to run the same binary with two different flags and to treat those
> > as two independent entities?
> > Of course, it's possible to implement support for something like this, but OSS-Fuzz does not and will not support it. 
> > (because of KISS: https://en.wikipedia.org/wiki/KISS_principle)
> > 
> > When analyzing the code coverage (manually, or automatically) there will be a huge lump of code that is never reached in one mode, i.e. this 2-in-1 bundle will confuse the analysis. 
> > 
> > Finally, at least in libFuzzer, part of the algorithm is linear by the size of the binary (more precisely: number of instrumented blocks) and so this bundled fuzzer will just be burning CPUs with no reason. 
> > 
> > 
> > > FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.
> > Yes, and I objected back then :) 
> > > I'm not sure what you mean here. What difficulties are you thinking of?
> > Imagine an automated system that runs continuous fuzzing (e.g.
> > https://github.com/google/oss-fuzz).
> > How are you going to tell it to run the same binary with two different flags and to treat those
> > as two independent entities?
> 
> I'm not familiar with oss-fuzz but based on an initial glance through I'm not sure how this is different from oss-fuzz/projects/curl/. That project is using pre-processor macros to select between different fuzzers.
> 
> To answer the question though, if I wanted to fuzz everything (assembler/disassembler, all arches, subarches, and feature combinations) in this kind of system and the curl/llvm-mc-fuzzer way had been ruled out. I'd probably use the first few bytes of the data as the configuration and do a full setup/teardown in LLVMFuzzerTestOneInput().
> 
> That said, I think that's a different kind of fuzzer to llvm-mc-fuzzer. It would aim to improve the quality of the LLVM project as a whole whereas llvm-mc-fuzzer was meant to help backend developers improve the quality of their particular targets and subtargets.
> 
> > Of course, it's possible to implement support for something like this, but OSS-Fuzz does not and
> > will not support it. 
> > (because of KISS: https://en.wikipedia.org/wiki/KISS_principle)
> 
> This principle is the reason this tool uses command line arguments for the action/triple/arch/subarch/features. Command line arguments were the simplest way to configure a particular target without having to re-compile for each combination. I included support for other archs/subarches/features because it made the original goal easier and also made the tool more useful to others.
> 
> > When analyzing the code coverage (manually, or automatically) there will be a huge lump of code
> > that is never reached in one mode, i.e. this 2-in-1 bundle will confuse the analysis.
> 
> FWIW, this is also the case between arches/subarches/features. For example, on an X86 host using default options, the AArch64/ARM/Mips/etc. disassemblers are not tested.
> 
> > Finally, at least in libFuzzer, part of the algorithm is linear by the size of the binary (more precisely:
> > number of instrumented blocks) and so this bundled fuzzer will just be burning CPUs with no
> > reason.
> 
> That's a fair point.
> 
> > > FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.
> > Yes, and I objected back then :)
> 
> I remember you objected to having a custom main function that mangled the arguments before passing them on to libFuzzer and I fixed that. I didn't think there was an objection to command line arguments in general though.
> 
> If the objection was to command line arguments in general, Is there a way to test an architecture in isolation from the others that's more in keeping with libFuzzer's style?
> using pre-processor macros to select between different fuzzers.

That's fine, since it creates different binaries, where unused code has a good chance to not even be linked in. 

>  I'd probably use the first few bytes of the data as the configuration
That's an option but it has 2 problems: 
1. now the inputs are from some new artificial data format
2. fuzzing is less efficient due to larger binary

> FWIW, this is also the case between arches/subarches/features.
Yes. 

> you objected to having a custom main function that mangled the arguments

Yes, and that is currently not supported at all (was a mistake). 

> Is there a way to test an architecture in isolation from the others that's more in keeping with libFuzzer's style?
Have separate binary (build target) for every distinct configuration of code we have in mind. 
How many are we talking about here? 
tens or hundreds? 
Having 20-30 binaries like this is totally fine imho, and works great on oss-fuzz (see e.g. the way we've done it for ffmpeg, where we have ~40 binaries). 
https://github.com/google/oss-fuzz/blob/master/projects/ffmpeg/build.sh

Repository:
  rL LLVM

https://reviews.llvm.org/D30156