[llvm] r247786 - llvm-mc-fuzzer: A fuzzing tool for the MC layer.

Kostya Serebryany via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 25 10:26:21 PDT 2015


On Sat, Sep 19, 2015 at 4:50 AM, Daniel Sanders <Daniel.Sanders at imgtec.com>
wrote:

> You've got the command right but llvm-mc doesn't accept raw binary input.
> You need something like:
>     0x62 0xef 0xbf 0xbd 0x58 0xef 0xbf 0xbd
>
> Got it.
When libFuzzer finds a crash it prints the reproducer as comma-separated
hex values (so that one can copy-paste to C code)
0x62,0xf1,0x16,0x8,0xc2,0x21,0x22
So, to feed it back to llvm-mc I only need s/,/ /g

Filed https://llvm.org/bugs/show_bug.cgi?id=24941 for -triple
x86_64-linux-gnu

I'm currently using the attached totxt.py script to convert the corpus to
> test files. The pretty printing assumes that instructions are 4 bytes and
> might discard the last 1-3 bytes if the input size isn't a multiple of 4.
> It's used like so:
>     python totxt.py corpus/* > output.txt
>
> For the sake of completeness, I've also attached my test->corpus script
> (tobin.py). It's used like this:
>   python tobin.py tests/*.txt | split --bytes=4 - corpus/init-
> tobin.py concatenates the input files like 'cat' does so something needs
> to chop it up into the initial corpus files. For a fixed-length ISA, the
> 'split' command does the job. I don't have a solution for variable length
> yet.
>
> I've been thinking it might be sensible to add raw binary support to
> llvm-mc so that we don't need these scripts and can use llvm-mc's pretty
> printing but I haven't had chance to look at that yet.
>
> I have had one case that wasn't reproducible in llvm-mc. llvm-mc-fuzzer
> will sometimes try to disassemble a 0-byte buffer and this triggered a
> buffer overflow in the microMIPS disassembler. llvm-mc won't call the
> disassembler without any data so I ended up reproducing it with a 1-byte
> input instead.
> ------------------------------
> *From:* Kostya Serebryany [kcc at google.com]
> *Sent:* 19 September 2015 01:44
>
> *To:* Daniel Sanders
> *Cc:* LLVM Commits
> *Subject:* Re: [llvm] r247786 - llvm-mc-fuzzer: A fuzzing tool for the MC
> layer.
>
> Daniel,
>
> one question related to /llvm-mc-fuzzer.
> When running as
>    ./bin/llvm-mc-fuzzer -triple x86_64-linux-gnu  -disassemble
> -fuzzer-args CORPUS -max_len=8
> I quickly run into this:
> ==24687==ERROR: AddressSanitizer: SEGV on unknown address 0xf4360000606f
> (pc 0x7f5ef64a3cc9 bp 0x7ffc1682a750 sp 0x7ffc1682a5c8 T0)
>     #0 0x7f5ef64a3cc8 in gsignal
> /build/buildd/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56
>     #1 0x7f5ef64a70d7 in abort /build/buildd/eglibc-2.19/stdlib/abort.c:89
>     #2 0xdd1bd8 in llvm::llvm_unreachable_internal(char const*, char
> const*, unsigned int) lib/Support/ErrorHandling.cpp:117:3
>     #3 0xb12448 in translateImmediate
> lib/Target/X86/Disassembler/X86Disassembler.cpp:379:16
>     #4 0xb12448 in translateOperand(llvm::MCInst&,
> llvm::X86Disassembler::OperandSpecifier const&,
> llvm::X86Disassembler::InternalInstruction&, llvm::MCDisassembler const*)
> lib/Target/X86/Disassembler/X86Disassembler.cpp:922
>     #5 0xb0d09b in translateInstruction
> lib/Target/X86/Disassembler/X86Disassembler.cpp:981:11
>     #6 0xb0d09b in
> llvm::X86Disassembler::X86GenericDisassembler::getInstruction(llvm::MCInst&,
> unsigned long&, llvm::ArrayRef<unsigned char>, unsigned long,
> llvm::raw_ostream&, llvm::raw_ostream&) const
> lib/Target/X86/Disassembler/X86Disassembler.cpp:160
>     #7 0xd3055b in LLVMDisasmInstruction
> lib/MC/MCDisassembler/Disassembler.cpp:253:7
>     #8 0x5162a6 in DisassembleOneInput(unsigned char const*, unsigned
> long) tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp:71:16
>
> But if I try to feed the crashy input into llvm-mc, nothing interesting
> happens:
>
> % ./bin/llvm-mc -triple x86_64-linux-gnu  -disassemble <
>  crash-e3c8c95134622581ba71de8274406456dafef3b3
> .text
> <stdin>:1:1: error: invalid input token
> b�,X�
>
> So, how do I invoke llvm-mc to make it behave close to what llvm-mc-fuzzer
> is doing?
>
>
>
>
> On Thu, Sep 17, 2015 at 5:32 PM, Kostya Serebryany <kcc at google.com> wrote:
>
>>
>>
>> On Thu, Sep 17, 2015 at 2:38 AM, Daniel Sanders <
>> Daniel.Sanders at imgtec.com> wrote:
>>
>>> > I forgot to ask you to document the fuzzer at
>>> http://llvm.org/docs/LibFuzzer.html#fuzzing-components-of-llvm
>>>
>>>
>>>
>>> Will do
>>>
>>>
>>>
>>> > One problem: with the current structure of flags libFuzzer's -jobs=10
>>> does not work...
>>>
>>> > Thoughts?
>>>
>>>
>>>
>>> Hmm. I see why that happens, each spawned thread is calling system() to
>>> spawn a subprocess and that system() call is given a command built from the
>>> fuzzer config. The resulting command lacks any of the non-fuzzer args and
>>> so the child llvm-mc-fuzzer is trying to parse arguments meant for the
>>> underlying fuzzer. Why does it spawn a subprocess from the worker thread
>>> instead of doing the work directly inside the worker thread? Am I right in
>>> thinking that it's to stop a crash in one job from killing everything?
>>>
>>>
>>>
>>> I can think of four options:
>>>
>>> 1.      fork() the new process instead of using system(). After the
>>> fork, the child should remove the effects of –job by setting it to 0 and
>>> reopen its stdout/stderr to achieve the same effect. This removes the need
>>> to reconstruct and reparse the command line since fork() will duplicate the
>>> result of the parse in the child process. Unfortunately, I don't think
>>> there's a direct Windows equivalent to this outside of Cygwin.
>>>
>>> 2.       Separate fuzzer option parsing from the driver call. I'm
>>> thinking something along the lines of this quick sketch:
>>>             FlagDescription *Config =
>>> FuzzerDriver::ParseFlags(FuzzerArgv);
>>>             return FuzzerDriver::FuzzerDriver(argv, Config,
>>> DisassembleOneInput);
>>> That would allow argv to differ from the options the fuzzer understands
>>> which are in FuzzerArgv.
>>>
>>> 3.       Make it possible to extend the fuzzer option parsing. The
>>> CommandLine library can do this nicely but you probably don't want the
>>> additional dependency in libFuzzer. Llvm-mc-fuzzer could always change to
>>> libFuzzer's approach to command line parsing.
>>>
>>> 4.      Make it possible to modify the command before the system()
>>> call. The client of libFuzzer could install a callback that allows it to
>>> modify a std::vector containing the desired Argv.
>>>
>>
>> I frankly like none of these, will need to think about it more...
>> It's probably not urgent for this particular fuzzer -- llvm-mc has pretty
>> small inputs and we can fuzz lots out of it in a single process.
>> But will need to figure out for future uses like this.
>> Maybe,
>>   5. Add a libFuzzer option -target_options=-option1,param,-option2
>> and run llvm-mc-fuzzer like "./bin/llvm-mc-fuzzer
>>  -target_options=-triple,x86_64-linux-gnu,-disassemble
>>
>> BTW, I've found one llvm_unreachable with -triple x86_64-linux-gnu
>> already... will file a bug.
>>
>>
>>
>> --kcc
>>
>>
>>>
>>>
>>> If all OS's had fork() then I'd favour #1 but Windows rules that out.
>>> Out of the rest #2 is seems the most flexible but #3/#4 are simpler. What's
>>> your opinion?
>>>
>>>
>>>
>>> *From:* Kostya Serebryany [mailto:kcc at google.com]
>>> *Sent:* 17 September 2015 05:38
>>> *To:* Daniel Sanders
>>> *Cc:* LLVM Commits
>>> *Subject:* Re: [llvm] r247786 - llvm-mc-fuzzer: A fuzzing tool for the
>>> MC layer.
>>>
>>>
>>>
>>> One problem: with the current structure of flags libFuzzer's -jobs=10
>>> does not work...
>>>
>>> Thoughts?
>>>
>>>
>>>
>>> On Wed, Sep 16, 2015 at 9:25 PM, Kostya Serebryany <kcc at google.com>
>>> wrote:
>>>
>>> Cool! I'll add it to the bot when time permits.
>>>
>>> I forgot to ask you to document the fuzzer
>>>
>>> at http://llvm.org/docs/LibFuzzer.html#fuzzing-components-of-llvm
>>>
>>> Feel free to do it w/o prior review.
>>>
>>>
>>>
>>> On Wed, Sep 16, 2015 at 4:49 AM, Daniel Sanders via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>> Author: dsanders
>>> Date: Wed Sep 16 06:49:49 2015
>>> New Revision: 247786
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=247786&view=rev
>>> Log:
>>> llvm-mc-fuzzer: A fuzzing tool for the MC layer.
>>>
>>> Summary:
>>> Only the disassembler is supported in this patch but it has already
>>> found a few
>>> issues in the Mips disassembler (mostly invalid instructions being
>>> successfully
>>> disassembled).
>>>
>>> Reviewers: kcc
>>>
>>> Subscribers: russell.gallop, silvas, kcc, llvm-commits
>>>
>>> Differential Revision: http://reviews.llvm.org/D12723
>>>
>>> Added:
>>>     llvm/trunk/tools/llvm-mc-fuzzer/
>>>     llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt
>>>     llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
>>> Modified:
>>>     llvm/trunk/docs/LibFuzzer.rst
>>>
>>> Modified: llvm/trunk/docs/LibFuzzer.rst
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/LibFuzzer.rst?rev=247786&r1=247785&r2=247786&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/docs/LibFuzzer.rst (original)
>>> +++ llvm/trunk/docs/LibFuzzer.rst Wed Sep 16 06:49:49 2015
>>> @@ -453,7 +453,14 @@ Trophies
>>>
>>>    * llvm-as: https://llvm.org/bugs/show_bug.cgi?id=24639
>>>
>>> -
>>> +  * Disassembler:
>>> +    * Mips: Discovered a number of untested instructions for the Mips
>>> target
>>> +      (see valid-mips*.s in http://reviews.llvm.org/rL247405,
>>> +      http://reviews.llvm.org/rL247414,
>>> http://reviews.llvm.org/rL247416,
>>> +      http://reviews.llvm.org/rL247417,
>>> http://reviews.llvm.org/rL247420,
>>> +      and http://reviews.llvm.org/rL247422) as well some instructions
>>> that
>>> +      successfully disassembled on ISA's where they were not valid (see
>>> +      invalid-xfail.s files in the same commits).
>>>
>>>  .. _pcre2: http://www.pcre.org/
>>>
>>>
>>> Added: llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt?rev=247786&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt (added)
>>> +++ llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt Wed Sep 16 06:49:49
>>> 2015
>>> @@ -0,0 +1,18 @@
>>> +if( LLVM_USE_SANITIZE_COVERAGE )
>>> +  include_directories(BEFORE
>>> +    ${CMAKE_CURRENT_SOURCE_DIR}/../../lib/Fuzzer)
>>> +
>>> +  set(LLVM_LINK_COMPONENTS
>>> +      AllTargetsDescs
>>> +      AllTargetsDisassemblers
>>> +      AllTargetsInfos
>>> +      MC
>>> +      MCDisassembler
>>> +      Support
>>> +      )
>>> +  add_llvm_tool(llvm-mc-fuzzer
>>> +                llvm-mc-fuzzer.cpp)
>>> +  target_link_libraries(llvm-mc-fuzzer
>>> +                        LLVMFuzzerNoMain
>>> +                        )
>>> +endif()
>>>
>>> Added: llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp?rev=247786&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp (added)
>>> +++ llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp Wed Sep 16
>>> 06:49:49 2015
>>> @@ -0,0 +1,129 @@
>>> +//===--- llvm-mc-fuzzer.cpp - Fuzzer for the MC layer
>>> ---------------------===//
>>> +//
>>> +//                     The LLVM Compiler Infrastructure
>>> +//
>>> +// This file is distributed under the University of Illinois Open Source
>>> +// License. See LICENSE.TXT for details.
>>> +//
>>>
>>> +//===----------------------------------------------------------------------===//
>>> +//
>>>
>>> +//===----------------------------------------------------------------------===//
>>> +
>>> +#include "llvm-c/Disassembler.h"
>>> +#include "llvm-c/Target.h"
>>> +#include "llvm/ADT/ArrayRef.h"
>>> +#include "llvm/MC/SubtargetFeature.h"
>>> +#include "llvm/Support/CommandLine.h"
>>> +#include "llvm/Support/raw_ostream.h"
>>> +#include "FuzzerInterface.h"
>>> +
>>> +using namespace llvm;
>>> +
>>> +const unsigned AssemblyTextBufSize = 80;
>>> +
>>> +enum ActionType {
>>> +  AC_Assemble,
>>> +  AC_Disassemble
>>> +};
>>> +
>>> +static cl::opt<ActionType>
>>> +Action(cl::desc("Action to perform:"),
>>> +       cl::init(AC_Assemble),
>>> +       cl::values(clEnumValN(AC_Assemble, "assemble",
>>> +                             "Assemble a .s file (default)"),
>>> +                  clEnumValN(AC_Disassemble, "disassemble",
>>> +                             "Disassemble strings of hex bytes"),
>>> +                  clEnumValEnd));
>>> +
>>> +static cl::opt<std::string>
>>> +    TripleName("triple", cl::desc("Target triple to assemble for, "
>>> +                                  "see -version for available
>>> targets"));
>>> +
>>> +static cl::opt<std::string>
>>> +    MCPU("mcpu",
>>> +         cl::desc("Target a specific cpu type (-mcpu=help for
>>> details)"),
>>> +         cl::value_desc("cpu-name"), cl::init(""));
>>> +
>>> +static cl::list<std::string>
>>> +    MAttrs("mattr", cl::CommaSeparated,
>>> +           cl::desc("Target specific attributes (-mattr=help for
>>> details)"),
>>> +           cl::value_desc("a1,+a2,-a3,..."));
>>> +// The feature string derived from -mattr's values.
>>> +std::string FeaturesStr;
>>> +
>>> +static cl::list<std::string>
>>> +    FuzzerArgv("fuzzer-args", cl::Positional,
>>> +               cl::desc("Options to pass to the fuzzer"),
>>> cl::ZeroOrMore,
>>> +               cl::PositionalEatsArgs);
>>> +
>>> +void DisassembleOneInput(const uint8_t *Data, size_t Size) {
>>> +  char AssemblyText[AssemblyTextBufSize];
>>> +
>>> +  std::vector<uint8_t> DataCopy(Data, Data + Size);
>>> +
>>> +  LLVMDisasmContextRef Ctx = LLVMCreateDisasmCPUFeatures(
>>> +      TripleName.c_str(), MCPU.c_str(), FeaturesStr.c_str(), nullptr, 0,
>>> +      nullptr, nullptr);
>>> +  assert(Ctx);
>>> +  uint8_t *p = DataCopy.data();
>>> +  unsigned Consumed;
>>> +  do {
>>> +    Consumed = LLVMDisasmInstruction(Ctx, p, Size, 0, AssemblyText,
>>> +                                     AssemblyTextBufSize);
>>> +    Size -= Consumed;
>>> +    p += Consumed;
>>> +  } while (Consumed != 0);
>>> +  LLVMDisasmDispose(Ctx);
>>> +}
>>> +
>>> +int main(int argc, char **argv) {
>>> +  // The command line is unusual compared to other fuzzers due to the
>>> need to
>>> +  // specify the target. Options like -triple, -mcpu, and -mattr work
>>> like
>>> +  // their counterparts in llvm-mc, while -fuzzer-args collects options
>>> for the
>>> +  // fuzzer itself.
>>> +  //
>>> +  // Examples:
>>> +  //
>>> +  // Fuzz the big-endian MIPS32R6 disassembler using 100,000 inputs of
>>> up to
>>> +  // 4-bytes each and use the contents of ./corpus as the test corpus:
>>> +  //   llvm-mc-fuzzer -triple mips-linux-gnu -mcpu=mips32r6
>>> -disassemble \
>>> +  //       -fuzzer-args -max_len=4 -runs=100000 ./corpus
>>> +  //
>>> +  // Infinitely fuzz the little-endian MIPS64R2 disassembler with the
>>> MSA
>>> +  // feature enabled using up to 64-byte inputs:
>>> +  //   llvm-mc-fuzzer -triple mipsel-linux-gnu -mcpu=mips64r2
>>> -mattr=msa \
>>> +  //       -disassemble -fuzzer-args ./corpus
>>> +  //
>>> +  // If your aim is to find instructions that are not tested, then it is
>>> +  // advisable to constrain the maximum input size to a single
>>> instruction
>>> +  // using -max_len as in the first example. This results in a test
>>> corpus of
>>> +  // individual instructions that test unique paths. Without this
>>> constraint,
>>> +  // there will be considerable redundancy in the corpus.
>>> +
>>> +  LLVMInitializeAllTargetInfos();
>>> +  LLVMInitializeAllTargetMCs();
>>> +  LLVMInitializeAllDisassemblers();
>>> +
>>> +  cl::ParseCommandLineOptions(argc, argv);
>>> +
>>> +  // Package up features to be passed to target/subtarget
>>> +  // We have to pass it via a global since the callback doesn't
>>> +  // permit any user data.
>>> +  if (MAttrs.size()) {
>>> +    SubtargetFeatures Features;
>>> +    for (unsigned i = 0; i != MAttrs.size(); ++i)
>>> +      Features.AddFeature(MAttrs[i]);
>>> +    FeaturesStr = Features.getString();
>>> +  }
>>> +
>>> +  // Insert the program name into the FuzzerArgv.
>>> +  FuzzerArgv.insert(FuzzerArgv.begin(), argv[0]);
>>> +
>>> +  if (Action == AC_Assemble)
>>> +    errs() << "error: -assemble is not implemented\n";
>>> +  else if (Action == AC_Disassemble)
>>> +    return fuzzer::FuzzerDriver(FuzzerArgv, DisassembleOneInput);
>>> +
>>> +  llvm_unreachable("Unknown action");
>>> +  return 1;
>>> +}
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150925/c6dd69d0/attachment.html>


More information about the llvm-commits mailing list