[PATCH] D39555: Introduce llvm-opt-fuzzer for fuzzing optimization passes

Thu Nov 2 09:07:04 PDT 2017

Igor Laevsky via Phabricator <reviews at reviews.llvm.org> writes:
> igor-laevsky created this revision.
> Herald added a subscriber: mgorny.
>
> Hi,
>
> Seeing recent advances in fuzzing technologies for llvm (libFuzzer,
> FuzzMutate, OSSFuzz) it's become reasonably simple to extend this
> approach for the general optimization passes.
>
> In this review I would like to propose a generic fuzzing target
> intended for the optimization passes and various combinations of
> them. This is very initial implementation which I tried to keep
> simple. Most of it's code is inherited from the llvm-isel-fuzzer.

This looks very useful! Thanks for working on it.

> This tool is intended to be run by the OSSFuzz, so interface is rather
> primitive.  User is only required to specify target triple and
> optimization pipeline using the new pass manager syntax.

One thing we'll run into with OSSFuzz is that we can't really pass
arguments at all there, so we'll probably have to do something like we
do in isel-fuzzer to accept some flags via argv[0]. It's a pretty
awkward solution, I know, but it's the simplest way to get going on
OSSFuzz at this point.

> In general our primary goal here is to continuously run OSSFuzz
> testing for some of the llvm passes which are not widely used (i.e
> passes which are not part of the default clang pipeline, IRCE, Loop
> Predication, RS4GC and so on).
>
> However I expect it would be simpler to start with the more popular
> passes just to have a chance to stabilize infrastructure and figure
> out good workflow for the discovered bugs.  So after this tool is
> integrated to the tree (if no one will have objections), next step
> would be to start OSSFuzz project for the InstCombine, as being single
> most used pass.
>
>
> https://reviews.llvm.org/D39555
>
> Files:
>   tools/llvm-opt-fuzzer/CMakeLists.txt
>   tools/llvm-opt-fuzzer/DummyOptFuzzer.cpp
>   tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
>
> Index: tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
> ===================================================================
> --- /dev/null
> +++ tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
> @@ -0,0 +1,258 @@
> +//===--- llvm-opt-fuzzer.cpp - Fuzzer for instruction selection ----------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// Tool to fuzz optimization passes using libFuzzer.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#include "llvm/Bitcode/BitcodeReader.h"
> +#include "llvm/Bitcode/BitcodeWriter.h"
> +#include "llvm/CodeGen/CommandFlags.h"
> +#include "llvm/FuzzMutate/FuzzerCLI.h"
> +#include "llvm/FuzzMutate/IRMutator.h"
> +#include "llvm/FuzzMutate/Operations.h"
> +#include "llvm/FuzzMutate/Random.h"
> +#include "llvm/IR/Verifier.h"
> +#include "llvm/Passes/PassBuilder.h"
> +#include "llvm/Support/SourceMgr.h"
> +#include "llvm/Support/TargetRegistry.h"
> +#include "llvm/Support/TargetSelect.h"
> +
> +using namespace llvm;
> +
> +static cl::opt<std::string>
> +    TargetTripleStr("mtriple", cl::desc("Override target triple for module"));
> +
> +// Passes to run for this fuzzer instance. Expects new pass manager syntax.
> +static cl::opt<std::string> PassPipeline(
> +    "passes",
> +    cl::desc("A textual description of the pass pipeline for optimizing"));
> +
> +static std::unique_ptr<IRMutator> Mutator;
> +static std::unique_ptr<TargetMachine> TM;
> +
> +// This function is mostly copied from the llvm-isel-fuzzer.
> +// TODO: Move this into FuzzMutate library and reuse.
> +static std::unique_ptr<Module> parseModule(const uint8_t *Data, size_t Size,
> +                                           LLVMContext &Context) {
> +
> +  if (Size <= 1)
> +    // We get bogus data given an empty corpus - just create a new module.
> +    return llvm::make_unique<Module>("M", Context);
> +
> +  auto Buffer = MemoryBuffer::getMemBuffer(
> +      StringRef(reinterpret_cast<const char *>(Data), Size), "Fuzzer input",
> +      /*RequiresNullTerminator=*/false);
> +
> +  SMDiagnostic Err;
> +  auto M = parseBitcodeFile(Buffer->getMemBufferRef(), Context);
> +  if (Error E = M.takeError()) {
> +    errs() << toString(std::move(E)) << "\n";
> +    return nullptr;
> +  }
> +  return std::move(M.get());
> +}
> +
> +// This function is copied from the llvm-isel-fuzzer.
> +// TODO: Move this into FuzzMutate library and reuse.
> +static size_t writeModule(const Module &M, uint8_t *Dest, size_t MaxSize) {
> +  std::string Buf;
> +  {
> +    raw_string_ostream OS(Buf);
> +    WriteBitcodeToFile(&M, OS);
> +  }
> +  if (Buf.size() > MaxSize)
> +      return 0;
> +  memcpy(Dest, Buf.data(), Buf.size());
> +  return Buf.size();
> +}
> +
> +std::unique_ptr<IRMutator> createOptMutator() {
> +  std::vector<TypeGetter> Types{
> +      Type::getInt1Ty,  Type::getInt8Ty,  Type::getInt16Ty, Type::getInt32Ty,
> +      Type::getInt64Ty, Type::getFloatTy, Type::getDoubleTy};
> +
> +  std::vector<std::unique_ptr<IRMutationStrategy>> Strategies;
> +  Strategies.push_back(
> +      llvm::make_unique<InjectorIRStrategy>(
> +          InjectorIRStrategy::getDefaultOps()));
> +  Strategies.push_back(
> +      llvm::make_unique<InstDeleterIRStrategy>());
> +
> +  return llvm::make_unique<IRMutator>(std::move(Types), std::move(Strategies));
> +}
> +
> +extern "C" LLVM_ATTRIBUTE_USED size_t LLVMFuzzerCustomMutator(
> +    uint8_t *Data, size_t Size, size_t MaxSize, unsigned int Seed) {
> +
> +  assert(Mutator &&
> +      "IR mutator should have been created during fuzzer initialization");
> +
> +  LLVMContext Context;
> +  auto M = parseModule(Data, Size, Context);
> +  if (!M || verifyModule(*M, &errs())) {
> +    errs() << "error: mutator input module is broken!\n";
> +    return 0;
> +  }
> +
> +  Mutator->mutateModule(*M, Seed, Size, MaxSize);
> +
> +#ifndef NDEBUG
> +  if (verifyModule(*M, &errs())) {
> +    errs() << "mutation result doesn't pass verification\n";
> +    M->dump();
> +    abort();
> +  }
> +#endif
> +
> +  return writeModule(*M, Data, MaxSize);
> +}
> +
> +extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
> +  assert(TM && "Should have been created during fuzzer initialization");
> +
> +  if (Size <= 1)
> +    // We get bogus data given an empty corpus - ignore it.
> +    return 0;
> +
> +  // Parse module
> +  //
> +
> +  LLVMContext Context;
> +  auto M = parseModule(Data, Size, Context);
> +  if (!M || verifyModule(*M, &errs())) {
> +    errs() << "error: input module is broken!\n";
> +    return 0;
> +  }
> +
> +  // Set up target dependant options
> +  //
> +
> +  M->setTargetTriple(TM->getTargetTriple().normalize());
> +  M->setDataLayout(TM->createDataLayout());
> +  setFunctionAttributes(TM->getTargetCPU(), TM->getTargetFeatureString(), *M);

These don't change from input to input do they? We should probably set
this up in the initialize function.

> +
> +  // Create pass pipeline
> +  //
> +
> +  PassBuilder PB(TM.get());
> +
> +  LoopAnalysisManager LAM;
> +  FunctionAnalysisManager FAM;
> +  CGSCCAnalysisManager CGAM;
> +  ModulePassManager MPM;
> +  ModuleAnalysisManager MAM;
> +
> +  FAM.registerPass([&] { return PB.buildDefaultAAPipeline(); });
> +  PB.registerModuleAnalyses(MAM);
> +  PB.registerCGSCCAnalyses(CGAM);
> +  PB.registerFunctionAnalyses(FAM);
> +  PB.registerLoopAnalyses(LAM);
> +  PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
> +
> +  bool Ok = PB.parsePassPipeline(MPM, PassPipeline, false, false);
> +  assert(Ok && "Should have been checked during fuzzer initialization");

Similarly here - we probably don't need to reconfigure the pass pipeline
every time.

> +
> +  // Run passes which we need to test
> +  //
> +
> +  MPM.run(*M, MAM);
> +
> +  // Check that passes resulted in a correct code
> +  if (verifyModule(*M, &errs())) {
> +    errs() << "Transformation resulted in an invalid module\n";
> +    abort();
> +  }
> +
> +  return 0;
> +}
> +
> +static void handleLLVMFatalError(void *, const std::string &Message, bool) {
> +  // TODO: Would it be better to call into the fuzzer internals directly?
> +  dbgs() << "LLVM ERROR: " << Message << "\n"
> +         << "Aborting to trigger fuzzer exit handling.\n";
> +  abort();
> +}
> +
> +extern "C" LLVM_ATTRIBUTE_USED int LLVMFuzzerInitialize(
> +    int *argc, char ***argv) {
> +  EnableDebugBuffering = true;
> +
> +  // Make sure we print the summary and the current unit when LLVM errors out.
> +  install_fatal_error_handler(handleLLVMFatalError, nullptr);
> +
> +  // Initialize llvm
> +  //
> +
> +  InitializeAllTargets();
> +  InitializeAllTargetMCs();
> +
> +  PassRegistry &Registry = *PassRegistry::getPassRegistry();
> +  initializeCore(Registry);
> +  initializeCoroutines(Registry);
> +  initializeScalarOpts(Registry);
> +  initializeObjCARCOpts(Registry);
> +  initializeVectorization(Registry);
> +  initializeIPO(Registry);
> +  initializeAnalysis(Registry);
> +  initializeTransformUtils(Registry);
> +  initializeInstCombine(Registry);
> +  initializeInstrumentation(Registry);
> +  initializeTarget(Registry);
> +
> +  // Parse input options
> +  //
> +
> +  parseFuzzerCLOpts(*argc, *argv);
> +
> +  // Create TargetMachine
> +  //
> +
> +  if (TargetTripleStr.empty()) {
> +    errs() << *argv[0] << ": -mtriple must be specified\n";
> +    exit(1);
> +  }
> +  Triple TargetTriple = Triple(Triple::normalize(TargetTripleStr));
> +
> +  std::string Error;
> +  const Target *TheTarget =
> +      TargetRegistry::lookupTarget(MArch, TargetTriple, Error);
> +  if (!TheTarget) {
> +    errs() << *argv[0] << ": " << Error;
> +    exit(1);
> +  }
> +
> +  TargetOptions Options = InitTargetOptionsFromCodeGenFlags();
> +  TM.reset(TheTarget->createTargetMachine(
> +      TargetTriple.getTriple(), getCPUStr(), getFeaturesStr(),
> +     Options, getRelocModel(), getCodeModel(), CodeGenOpt::Default));
> +  assert(TM && "Could not allocate target machine!");
> +
> +  // Check that pass pipeline is specified and correct
> +  //
> +
> +  if (PassPipeline.empty()) {
> +    errs() << *argv[0] << ": at least one pass should be specified\n";
> +    exit(1);
> +  }
> +
> +  PassBuilder PB(TM.get());
> +  ModulePassManager MPM;
> +  if (!PB.parsePassPipeline(MPM, PassPipeline, false, false)) {
> +    errs() << *argv[0] << ": can't parse pass pipeline\n";
> +    exit(1);
> +  }
> +
> +  // Create mutator
> +  //
> +
> +  Mutator = createOptMutator();
> +
> +  return 0;
> +}
> Index: tools/llvm-opt-fuzzer/DummyOptFuzzer.cpp
> ===================================================================
> --- /dev/null
> +++ tools/llvm-opt-fuzzer/DummyOptFuzzer.cpp
> @@ -0,0 +1,21 @@
> +//===--- DummyOptFuzzer.cpp - Entry point to sanity check the fuzzer ------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// Implementation of main so we can build and test without linking libFuzzer.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#include "llvm/FuzzMutate/FuzzerCLI.h"
> +
> +extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size);
> +extern "C" int LLVMFuzzerInitialize(int *argc, char ***argv);
> +int main(int argc, char *argv[]) {
> +    return llvm::runFuzzerOnInputs(argc, argv, LLVMFuzzerTestOneInput,
> +                                   LLVMFuzzerInitialize);
> +}
> Index: tools/llvm-opt-fuzzer/CMakeLists.txt
> ===================================================================
> --- /dev/null
> +++ tools/llvm-opt-fuzzer/CMakeLists.txt
> @@ -0,0 +1,24 @@
> +set(LLVM_LINK_COMPONENTS
> +  ${LLVM_TARGETS_TO_BUILD}
> +  Analysis
> +  BitWriter
> +  CodeGen
> +  Core
> +  Coroutines
> +  IPO
> +  IRReader
> +  InstCombine
> +  Instrumentation
> +  FuzzMutate
> +  MC
> +  ObjCARCOpts
> +  ScalarOpts
> +  Support
> +  Target
> +  TransformUtils
> +  Vectorize
> +  Passes
> +)
> +
> +add_llvm_fuzzer(llvm-opt-fuzzer llvm-opt-fuzzer.cpp
> +  DUMMY_MAIN DummyOptFuzzer.cpp)