[llvm] r247786 - llvm-mc-fuzzer: A fuzzing tool for the MC layer.

Daniel Sanders via llvm-commits llvm-commits at lists.llvm.org
Sat Sep 19 04:50:47 PDT 2015


You've got the command right but llvm-mc doesn't accept raw binary input. You need something like:
    0x62 0xef 0xbf 0xbd 0x58 0xef 0xbf 0xbd

I'm currently using the attached totxt.py script to convert the corpus to test files. The pretty printing assumes that instructions are 4 bytes and might discard the last 1-3 bytes if the input size isn't a multiple of 4. It's used like so:
    python totxt.py corpus/* > output.txt

For the sake of completeness, I've also attached my test->corpus script (tobin.py). It's used like this:
  python tobin.py tests/*.txt | split --bytes=4 - corpus/init-
tobin.py concatenates the input files like 'cat' does so something needs to chop it up into the initial corpus files. For a fixed-length ISA, the 'split' command does the job. I don't have a solution for variable length yet.

I've been thinking it might be sensible to add raw binary support to llvm-mc so that we don't need these scripts and can use llvm-mc's pretty printing but I haven't had chance to look at that yet.

I have had one case that wasn't reproducible in llvm-mc. llvm-mc-fuzzer will sometimes try to disassemble a 0-byte buffer and this triggered a buffer overflow in the microMIPS disassembler. llvm-mc won't call the disassembler without any data so I ended up reproducing it with a 1-byte input instead.
________________________________
From: Kostya Serebryany [kcc at google.com]
Sent: 19 September 2015 01:44
To: Daniel Sanders
Cc: LLVM Commits
Subject: Re: [llvm] r247786 - llvm-mc-fuzzer: A fuzzing tool for the MC layer.

Daniel,

one question related to /llvm-mc-fuzzer.
When running as
   ./bin/llvm-mc-fuzzer -triple x86_64-linux-gnu  -disassemble -fuzzer-args CORPUS -max_len=8
I quickly run into this:
==24687==ERROR: AddressSanitizer: SEGV on unknown address 0xf4360000606f (pc 0x7f5ef64a3cc9 bp 0x7ffc1682a750 sp 0x7ffc1682a5c8 T0)
    #0 0x7f5ef64a3cc8 in gsignal /build/buildd/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56
    #1 0x7f5ef64a70d7 in abort /build/buildd/eglibc-2.19/stdlib/abort.c:89
    #2 0xdd1bd8 in llvm::llvm_unreachable_internal(char const*, char const*, unsigned int) lib/Support/ErrorHandling.cpp:117:3
    #3 0xb12448 in translateImmediate lib/Target/X86/Disassembler/X86Disassembler.cpp:379:16
    #4 0xb12448 in translateOperand(llvm::MCInst&, llvm::X86Disassembler::OperandSpecifier const&, llvm::X86Disassembler::InternalInstruction&, llvm::MCDisassembler const*) lib/Target/X86/Disassembler/X86Disassembler.cpp:922
    #5 0xb0d09b in translateInstruction lib/Target/X86/Disassembler/X86Disassembler.cpp:981:11
    #6 0xb0d09b in llvm::X86Disassembler::X86GenericDisassembler::getInstruction(llvm::MCInst&, unsigned long&, llvm::ArrayRef<unsigned char>, unsigned long, llvm::raw_ostream&, llvm::raw_ostream&) const lib/Target/X86/Disassembler/X86Disassembler.cpp:160
    #7 0xd3055b in LLVMDisasmInstruction lib/MC/MCDisassembler/Disassembler.cpp:253:7
    #8 0x5162a6 in DisassembleOneInput(unsigned char const*, unsigned long) tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp:71:16

But if I try to feed the crashy input into llvm-mc, nothing interesting happens:

% ./bin/llvm-mc -triple x86_64-linux-gnu  -disassemble <  crash-e3c8c95134622581ba71de8274406456dafef3b3
.text
<stdin>:1:1: error: invalid input token
b�,X�

So, how do I invoke llvm-mc to make it behave close to what llvm-mc-fuzzer is doing?




On Thu, Sep 17, 2015 at 5:32 PM, Kostya Serebryany <kcc at google.com<mailto:kcc at google.com>> wrote:


On Thu, Sep 17, 2015 at 2:38 AM, Daniel Sanders <Daniel.Sanders at imgtec.com<mailto:Daniel.Sanders at imgtec.com>> wrote:
> I forgot to ask you to document the fuzzer at http://llvm.org/docs/LibFuzzer.html#fuzzing-components-of-llvm

Will do

> One problem: with the current structure of flags libFuzzer's -jobs=10 does not work...
> Thoughts?

Hmm. I see why that happens, each spawned thread is calling system() to spawn a subprocess and that system() call is given a command built from the fuzzer config. The resulting command lacks any of the non-fuzzer args and so the child llvm-mc-fuzzer is trying to parse arguments meant for the underlying fuzzer. Why does it spawn a subprocess from the worker thread instead of doing the work directly inside the worker thread? Am I right in thinking that it's to stop a crash in one job from killing everything?

I can think of four options:

1.      fork() the new process instead of using system(). After the fork, the child should remove the effects of –job by setting it to 0 and reopen its stdout/stderr to achieve the same effect. This removes the need to reconstruct and reparse the command line since fork() will duplicate the result of the parse in the child process. Unfortunately, I don't think there's a direct Windows equivalent to this outside of Cygwin.

2.       Separate fuzzer option parsing from the driver call. I'm thinking something along the lines of this quick sketch:
            FlagDescription *Config = FuzzerDriver::ParseFlags(FuzzerArgv);
            return FuzzerDriver::FuzzerDriver(argv, Config, DisassembleOneInput);
That would allow argv to differ from the options the fuzzer understands which are in FuzzerArgv.

3.       Make it possible to extend the fuzzer option parsing. The CommandLine library can do this nicely but you probably don't want the additional dependency in libFuzzer. Llvm-mc-fuzzer could always change to libFuzzer's approach to command line parsing.

4.      Make it possible to modify the command before the system() call. The client of libFuzzer could install a callback that allows it to modify a std::vector containing the desired Argv.

I frankly like none of these, will need to think about it more...
It's probably not urgent for this particular fuzzer -- llvm-mc has pretty small inputs and we can fuzz lots out of it in a single process.
But will need to figure out for future uses like this.
Maybe,
  5. Add a libFuzzer option -target_options=-option1,param,-option2
and run llvm-mc-fuzzer like "./bin/llvm-mc-fuzzer  -target_options=-triple,x86_64-linux-gnu,-disassemble

BTW, I've found one llvm_unreachable with -triple x86_64-linux-gnu already... will file a bug.



--kcc


If all OS's had fork() then I'd favour #1 but Windows rules that out. Out of the rest #2 is seems the most flexible but #3/#4 are simpler. What's your opinion?

From: Kostya Serebryany [mailto:kcc at google.com<mailto:kcc at google.com>]
Sent: 17 September 2015 05:38
To: Daniel Sanders
Cc: LLVM Commits
Subject: Re: [llvm] r247786 - llvm-mc-fuzzer: A fuzzing tool for the MC layer.

One problem: with the current structure of flags libFuzzer's -jobs=10 does not work...
Thoughts?

On Wed, Sep 16, 2015 at 9:25 PM, Kostya Serebryany <kcc at google.com<mailto:kcc at google.com>> wrote:
Cool! I'll add it to the bot when time permits.
I forgot to ask you to document the fuzzer
at http://llvm.org/docs/LibFuzzer.html#fuzzing-components-of-llvm
Feel free to do it w/o prior review.

On Wed, Sep 16, 2015 at 4:49 AM, Daniel Sanders via llvm-commits <llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>> wrote:
Author: dsanders
Date: Wed Sep 16 06:49:49 2015
New Revision: 247786

URL: http://llvm.org/viewvc/llvm-project?rev=247786&view=rev
Log:
llvm-mc-fuzzer: A fuzzing tool for the MC layer.

Summary:
Only the disassembler is supported in this patch but it has already found a few
issues in the Mips disassembler (mostly invalid instructions being successfully
disassembled).

Reviewers: kcc

Subscribers: russell.gallop, silvas, kcc, llvm-commits

Differential Revision: http://reviews.llvm.org/D12723

Added:
    llvm/trunk/tools/llvm-mc-fuzzer/
    llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt
    llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
Modified:
    llvm/trunk/docs/LibFuzzer.rst

Modified: llvm/trunk/docs/LibFuzzer.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/LibFuzzer.rst?rev=247786&r1=247785&r2=247786&view=diff
==============================================================================
--- llvm/trunk/docs/LibFuzzer.rst (original)
+++ llvm/trunk/docs/LibFuzzer.rst Wed Sep 16 06:49:49 2015
@@ -453,7 +453,14 @@ Trophies

   * llvm-as: https://llvm.org/bugs/show_bug.cgi?id=24639

-
+  * Disassembler:
+    * Mips: Discovered a number of untested instructions for the Mips target
+      (see valid-mips*.s in http://reviews.llvm.org/rL247405,
+      http://reviews.llvm.org/rL247414, http://reviews.llvm.org/rL247416,
+      http://reviews.llvm.org/rL247417, http://reviews.llvm.org/rL247420,
+      and http://reviews.llvm.org/rL247422) as well some instructions that
+      successfully disassembled on ISA's where they were not valid (see
+      invalid-xfail.s files in the same commits).

 .. _pcre2: http://www.pcre.org/


Added: llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt?rev=247786&view=auto
==============================================================================
--- llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt (added)
+++ llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt Wed Sep 16 06:49:49 2015
@@ -0,0 +1,18 @@
+if( LLVM_USE_SANITIZE_COVERAGE )
+  include_directories(BEFORE
+    ${CMAKE_CURRENT_SOURCE_DIR}/../../lib/Fuzzer)
+
+  set(LLVM_LINK_COMPONENTS
+      AllTargetsDescs
+      AllTargetsDisassemblers
+      AllTargetsInfos
+      MC
+      MCDisassembler
+      Support
+      )
+  add_llvm_tool(llvm-mc-fuzzer
+                llvm-mc-fuzzer.cpp)
+  target_link_libraries(llvm-mc-fuzzer
+                        LLVMFuzzerNoMain
+                        )
+endif()

Added: llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp?rev=247786&view=auto
==============================================================================
--- llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp (added)
+++ llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp Wed Sep 16 06:49:49 2015
@@ -0,0 +1,129 @@
+//===--- llvm-mc-fuzzer.cpp - Fuzzer for the MC layer ---------------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm-c/Disassembler.h"
+#include "llvm-c/Target.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/MC/SubtargetFeature.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/raw_ostream.h"
+#include "FuzzerInterface.h"
+
+using namespace llvm;
+
+const unsigned AssemblyTextBufSize = 80;
+
+enum ActionType {
+  AC_Assemble,
+  AC_Disassemble
+};
+
+static cl::opt<ActionType>
+Action(cl::desc("Action to perform:"),
+       cl::init(AC_Assemble),
+       cl::values(clEnumValN(AC_Assemble, "assemble",
+                             "Assemble a .s file (default)"),
+                  clEnumValN(AC_Disassemble, "disassemble",
+                             "Disassemble strings of hex bytes"),
+                  clEnumValEnd));
+
+static cl::opt<std::string>
+    TripleName("triple", cl::desc("Target triple to assemble for, "
+                                  "see -version for available targets"));
+
+static cl::opt<std::string>
+    MCPU("mcpu",
+         cl::desc("Target a specific cpu type (-mcpu=help for details)"),
+         cl::value_desc("cpu-name"), cl::init(""));
+
+static cl::list<std::string>
+    MAttrs("mattr", cl::CommaSeparated,
+           cl::desc("Target specific attributes (-mattr=help for details)"),
+           cl::value_desc("a1,+a2,-a3,..."));
+// The feature string derived from -mattr's values.
+std::string FeaturesStr;
+
+static cl::list<std::string>
+    FuzzerArgv("fuzzer-args", cl::Positional,
+               cl::desc("Options to pass to the fuzzer"), cl::ZeroOrMore,
+               cl::PositionalEatsArgs);
+
+void DisassembleOneInput(const uint8_t *Data, size_t Size) {
+  char AssemblyText[AssemblyTextBufSize];
+
+  std::vector<uint8_t> DataCopy(Data, Data + Size);
+
+  LLVMDisasmContextRef Ctx = LLVMCreateDisasmCPUFeatures(
+      TripleName.c_str(), MCPU.c_str(), FeaturesStr.c_str(), nullptr, 0,
+      nullptr, nullptr);
+  assert(Ctx);
+  uint8_t *p = DataCopy.data();
+  unsigned Consumed;
+  do {
+    Consumed = LLVMDisasmInstruction(Ctx, p, Size, 0, AssemblyText,
+                                     AssemblyTextBufSize);
+    Size -= Consumed;
+    p += Consumed;
+  } while (Consumed != 0);
+  LLVMDisasmDispose(Ctx);
+}
+
+int main(int argc, char **argv) {
+  // The command line is unusual compared to other fuzzers due to the need to
+  // specify the target. Options like -triple, -mcpu, and -mattr work like
+  // their counterparts in llvm-mc, while -fuzzer-args collects options for the
+  // fuzzer itself.
+  //
+  // Examples:
+  //
+  // Fuzz the big-endian MIPS32R6 disassembler using 100,000 inputs of up to
+  // 4-bytes each and use the contents of ./corpus as the test corpus:
+  //   llvm-mc-fuzzer -triple mips-linux-gnu -mcpu=mips32r6 -disassemble \
+  //       -fuzzer-args -max_len=4 -runs=100000 ./corpus
+  //
+  // Infinitely fuzz the little-endian MIPS64R2 disassembler with the MSA
+  // feature enabled using up to 64-byte inputs:
+  //   llvm-mc-fuzzer -triple mipsel-linux-gnu -mcpu=mips64r2 -mattr=msa \
+  //       -disassemble -fuzzer-args ./corpus
+  //
+  // If your aim is to find instructions that are not tested, then it is
+  // advisable to constrain the maximum input size to a single instruction
+  // using -max_len as in the first example. This results in a test corpus of
+  // individual instructions that test unique paths. Without this constraint,
+  // there will be considerable redundancy in the corpus.
+
+  LLVMInitializeAllTargetInfos();
+  LLVMInitializeAllTargetMCs();
+  LLVMInitializeAllDisassemblers();
+
+  cl::ParseCommandLineOptions(argc, argv);
+
+  // Package up features to be passed to target/subtarget
+  // We have to pass it via a global since the callback doesn't
+  // permit any user data.
+  if (MAttrs.size()) {
+    SubtargetFeatures Features;
+    for (unsigned i = 0; i != MAttrs.size(); ++i)
+      Features.AddFeature(MAttrs[i]);
+    FeaturesStr = Features.getString();
+  }
+
+  // Insert the program name into the FuzzerArgv.
+  FuzzerArgv.insert(FuzzerArgv.begin(), argv[0]);
+
+  if (Action == AC_Assemble)
+    errs() << "error: -assemble is not implemented\n";
+  else if (Action == AC_Disassemble)
+    return fuzzer::FuzzerDriver(FuzzerArgv, DisassembleOneInput);
+
+  llvm_unreachable("Unknown action");
+  return 1;
+}


_______________________________________________
llvm-commits mailing list
llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150919/aca4e962/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: totxt.py
Type: text/x-python
Size: 376 bytes
Desc: totxt.py
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150919/aca4e962/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tobin.py
Type: text/x-python
Size: 235 bytes
Desc: tobin.py
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150919/aca4e962/attachment-0001.py>


More information about the llvm-commits mailing list