[llvm] r247786 - llvm-mc-fuzzer: A fuzzing tool for the MC layer.
Kostya Serebryany via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 25 11:06:22 PDT 2015
And I've realized what to do with the flags
Of course, I've solved this problem before: libFuzzer's flag parser ignores
everything
that starts with --
LLVM flag parser can consume flags starting with -- and
ignores everything after "--" parameter.
So,
Index: llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
===================================================================
--- llvm-mc-fuzzer/llvm-mc-fuzzer.cpp (revision 248580)
+++ llvm-mc-fuzzer/llvm-mc-fuzzer.cpp (working copy)
@@ -133,7 +133,7 @@
if (Action == AC_Assemble)
errs() << "error: -assemble is not implemented\n";
else if (Action == AC_Disassemble)
- return fuzzer::FuzzerDriver(FuzzerArgv, DisassembleOneInput);
+ return fuzzer::FuzzerDriver(argc, argv, DisassembleOneInput);
llvm_unreachable("Unknown action");
return 1;
and then call the fuzzer like this (note, -jobs=10 now works):
./bin/llvm-mc-fuzzer --triple=arm-linux-gnu --disassemble -- MC-ARM
-max_len=16 -use_counters=0 -jobs=10
Does this sound ok?
Would you like to make the change (and update the comments/docs)?
--kcc
On Fri, Sep 25, 2015 at 10:26 AM, Kostya Serebryany <kcc at google.com> wrote:
>
>
> On Sat, Sep 19, 2015 at 4:50 AM, Daniel Sanders <Daniel.Sanders at imgtec.com
> > wrote:
>
>> You've got the command right but llvm-mc doesn't accept raw binary input.
>> You need something like:
>> 0x62 0xef 0xbf 0xbd 0x58 0xef 0xbf 0xbd
>>
>> Got it.
> When libFuzzer finds a crash it prints the reproducer as comma-separated
> hex values (so that one can copy-paste to C code)
> 0x62,0xf1,0x16,0x8,0xc2,0x21,0x22
> So, to feed it back to llvm-mc I only need s/,/ /g
>
> Filed https://llvm.org/bugs/show_bug.cgi?id=24941 for -triple
> x86_64-linux-gnu
>
> I'm currently using the attached totxt.py script to convert the corpus to
>> test files. The pretty printing assumes that instructions are 4 bytes and
>> might discard the last 1-3 bytes if the input size isn't a multiple of 4.
>> It's used like so:
>> python totxt.py corpus/* > output.txt
>>
>> For the sake of completeness, I've also attached my test->corpus script
>> (tobin.py). It's used like this:
>> python tobin.py tests/*.txt | split --bytes=4 - corpus/init-
>> tobin.py concatenates the input files like 'cat' does so something needs
>> to chop it up into the initial corpus files. For a fixed-length ISA, the
>> 'split' command does the job. I don't have a solution for variable length
>> yet.
>>
>> I've been thinking it might be sensible to add raw binary support to
>> llvm-mc so that we don't need these scripts and can use llvm-mc's pretty
>> printing but I haven't had chance to look at that yet.
>>
>> I have had one case that wasn't reproducible in llvm-mc. llvm-mc-fuzzer
>> will sometimes try to disassemble a 0-byte buffer and this triggered a
>> buffer overflow in the microMIPS disassembler. llvm-mc won't call the
>> disassembler without any data so I ended up reproducing it with a 1-byte
>> input instead.
>> ------------------------------
>> *From:* Kostya Serebryany [kcc at google.com]
>> *Sent:* 19 September 2015 01:44
>>
>> *To:* Daniel Sanders
>> *Cc:* LLVM Commits
>> *Subject:* Re: [llvm] r247786 - llvm-mc-fuzzer: A fuzzing tool for the
>> MC layer.
>>
>> Daniel,
>>
>> one question related to /llvm-mc-fuzzer.
>> When running as
>> ./bin/llvm-mc-fuzzer -triple x86_64-linux-gnu -disassemble
>> -fuzzer-args CORPUS -max_len=8
>> I quickly run into this:
>> ==24687==ERROR: AddressSanitizer: SEGV on unknown address 0xf4360000606f
>> (pc 0x7f5ef64a3cc9 bp 0x7ffc1682a750 sp 0x7ffc1682a5c8 T0)
>> #0 0x7f5ef64a3cc8 in gsignal
>> /build/buildd/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> #1 0x7f5ef64a70d7 in abort /build/buildd/eglibc-2.19/stdlib/abort.c:89
>> #2 0xdd1bd8 in llvm::llvm_unreachable_internal(char const*, char
>> const*, unsigned int) lib/Support/ErrorHandling.cpp:117:3
>> #3 0xb12448 in translateImmediate
>> lib/Target/X86/Disassembler/X86Disassembler.cpp:379:16
>> #4 0xb12448 in translateOperand(llvm::MCInst&,
>> llvm::X86Disassembler::OperandSpecifier const&,
>> llvm::X86Disassembler::InternalInstruction&, llvm::MCDisassembler const*)
>> lib/Target/X86/Disassembler/X86Disassembler.cpp:922
>> #5 0xb0d09b in translateInstruction
>> lib/Target/X86/Disassembler/X86Disassembler.cpp:981:11
>> #6 0xb0d09b in
>> llvm::X86Disassembler::X86GenericDisassembler::getInstruction(llvm::MCInst&,
>> unsigned long&, llvm::ArrayRef<unsigned char>, unsigned long,
>> llvm::raw_ostream&, llvm::raw_ostream&) const
>> lib/Target/X86/Disassembler/X86Disassembler.cpp:160
>> #7 0xd3055b in LLVMDisasmInstruction
>> lib/MC/MCDisassembler/Disassembler.cpp:253:7
>> #8 0x5162a6 in DisassembleOneInput(unsigned char const*, unsigned
>> long) tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp:71:16
>>
>> But if I try to feed the crashy input into llvm-mc, nothing interesting
>> happens:
>>
>> % ./bin/llvm-mc -triple x86_64-linux-gnu -disassemble <
>> crash-e3c8c95134622581ba71de8274406456dafef3b3
>> .text
>> <stdin>:1:1: error: invalid input token
>> b�,X�
>>
>> So, how do I invoke llvm-mc to make it behave close to what
>> llvm-mc-fuzzer is doing?
>>
>>
>>
>>
>> On Thu, Sep 17, 2015 at 5:32 PM, Kostya Serebryany <kcc at google.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Sep 17, 2015 at 2:38 AM, Daniel Sanders <
>>> Daniel.Sanders at imgtec.com> wrote:
>>>
>>>> > I forgot to ask you to document the fuzzer at
>>>> http://llvm.org/docs/LibFuzzer.html#fuzzing-components-of-llvm
>>>>
>>>>
>>>>
>>>> Will do
>>>>
>>>>
>>>>
>>>> > One problem: with the current structure of flags libFuzzer's -jobs=10
>>>> does not work...
>>>>
>>>> > Thoughts?
>>>>
>>>>
>>>>
>>>> Hmm. I see why that happens, each spawned thread is calling system() to
>>>> spawn a subprocess and that system() call is given a command built from the
>>>> fuzzer config. The resulting command lacks any of the non-fuzzer args and
>>>> so the child llvm-mc-fuzzer is trying to parse arguments meant for the
>>>> underlying fuzzer. Why does it spawn a subprocess from the worker thread
>>>> instead of doing the work directly inside the worker thread? Am I right in
>>>> thinking that it's to stop a crash in one job from killing everything?
>>>>
>>>>
>>>>
>>>> I can think of four options:
>>>>
>>>> 1. fork() the new process instead of using system(). After the
>>>> fork, the child should remove the effects of –job by setting it to 0 and
>>>> reopen its stdout/stderr to achieve the same effect. This removes the need
>>>> to reconstruct and reparse the command line since fork() will duplicate the
>>>> result of the parse in the child process. Unfortunately, I don't think
>>>> there's a direct Windows equivalent to this outside of Cygwin.
>>>>
>>>> 2. Separate fuzzer option parsing from the driver call. I'm
>>>> thinking something along the lines of this quick sketch:
>>>> FlagDescription *Config =
>>>> FuzzerDriver::ParseFlags(FuzzerArgv);
>>>> return FuzzerDriver::FuzzerDriver(argv, Config,
>>>> DisassembleOneInput);
>>>> That would allow argv to differ from the options the fuzzer understands
>>>> which are in FuzzerArgv.
>>>>
>>>> 3. Make it possible to extend the fuzzer option parsing. The
>>>> CommandLine library can do this nicely but you probably don't want the
>>>> additional dependency in libFuzzer. Llvm-mc-fuzzer could always change to
>>>> libFuzzer's approach to command line parsing.
>>>>
>>>> 4. Make it possible to modify the command before the system()
>>>> call. The client of libFuzzer could install a callback that allows it to
>>>> modify a std::vector containing the desired Argv.
>>>>
>>>
>>> I frankly like none of these, will need to think about it more...
>>> It's probably not urgent for this particular fuzzer -- llvm-mc has
>>> pretty small inputs and we can fuzz lots out of it in a single process.
>>> But will need to figure out for future uses like this.
>>> Maybe,
>>> 5. Add a libFuzzer option -target_options=-option1,param,-option2
>>> and run llvm-mc-fuzzer like "./bin/llvm-mc-fuzzer
>>> -target_options=-triple,x86_64-linux-gnu,-disassemble
>>>
>>> BTW, I've found one llvm_unreachable with -triple x86_64-linux-gnu
>>> already... will file a bug.
>>>
>>>
>>>
>>> --kcc
>>>
>>>
>>>>
>>>>
>>>> If all OS's had fork() then I'd favour #1 but Windows rules that out.
>>>> Out of the rest #2 is seems the most flexible but #3/#4 are simpler. What's
>>>> your opinion?
>>>>
>>>>
>>>>
>>>> *From:* Kostya Serebryany [mailto:kcc at google.com]
>>>> *Sent:* 17 September 2015 05:38
>>>> *To:* Daniel Sanders
>>>> *Cc:* LLVM Commits
>>>> *Subject:* Re: [llvm] r247786 - llvm-mc-fuzzer: A fuzzing tool for the
>>>> MC layer.
>>>>
>>>>
>>>>
>>>> One problem: with the current structure of flags libFuzzer's -jobs=10
>>>> does not work...
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>>
>>>> On Wed, Sep 16, 2015 at 9:25 PM, Kostya Serebryany <kcc at google.com>
>>>> wrote:
>>>>
>>>> Cool! I'll add it to the bot when time permits.
>>>>
>>>> I forgot to ask you to document the fuzzer
>>>>
>>>> at http://llvm.org/docs/LibFuzzer.html#fuzzing-components-of-llvm
>>>>
>>>> Feel free to do it w/o prior review.
>>>>
>>>>
>>>>
>>>> On Wed, Sep 16, 2015 at 4:49 AM, Daniel Sanders via llvm-commits <
>>>> llvm-commits at lists.llvm.org> wrote:
>>>>
>>>> Author: dsanders
>>>> Date: Wed Sep 16 06:49:49 2015
>>>> New Revision: 247786
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=247786&view=rev
>>>> Log:
>>>> llvm-mc-fuzzer: A fuzzing tool for the MC layer.
>>>>
>>>> Summary:
>>>> Only the disassembler is supported in this patch but it has already
>>>> found a few
>>>> issues in the Mips disassembler (mostly invalid instructions being
>>>> successfully
>>>> disassembled).
>>>>
>>>> Reviewers: kcc
>>>>
>>>> Subscribers: russell.gallop, silvas, kcc, llvm-commits
>>>>
>>>> Differential Revision: http://reviews.llvm.org/D12723
>>>>
>>>> Added:
>>>> llvm/trunk/tools/llvm-mc-fuzzer/
>>>> llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt
>>>> llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
>>>> Modified:
>>>> llvm/trunk/docs/LibFuzzer.rst
>>>>
>>>> Modified: llvm/trunk/docs/LibFuzzer.rst
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/LibFuzzer.rst?rev=247786&r1=247785&r2=247786&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/docs/LibFuzzer.rst (original)
>>>> +++ llvm/trunk/docs/LibFuzzer.rst Wed Sep 16 06:49:49 2015
>>>> @@ -453,7 +453,14 @@ Trophies
>>>>
>>>> * llvm-as: https://llvm.org/bugs/show_bug.cgi?id=24639
>>>>
>>>> -
>>>> + * Disassembler:
>>>> + * Mips: Discovered a number of untested instructions for the Mips
>>>> target
>>>> + (see valid-mips*.s in http://reviews.llvm.org/rL247405,
>>>> + http://reviews.llvm.org/rL247414,
>>>> http://reviews.llvm.org/rL247416,
>>>> + http://reviews.llvm.org/rL247417,
>>>> http://reviews.llvm.org/rL247420,
>>>> + and http://reviews.llvm.org/rL247422) as well some instructions
>>>> that
>>>> + successfully disassembled on ISA's where they were not valid (see
>>>> + invalid-xfail.s files in the same commits).
>>>>
>>>> .. _pcre2: http://www.pcre.org/
>>>>
>>>>
>>>> Added: llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt?rev=247786&view=auto
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt (added)
>>>> +++ llvm/trunk/tools/llvm-mc-fuzzer/CMakeLists.txt Wed Sep 16 06:49:49
>>>> 2015
>>>> @@ -0,0 +1,18 @@
>>>> +if( LLVM_USE_SANITIZE_COVERAGE )
>>>> + include_directories(BEFORE
>>>> + ${CMAKE_CURRENT_SOURCE_DIR}/../../lib/Fuzzer)
>>>> +
>>>> + set(LLVM_LINK_COMPONENTS
>>>> + AllTargetsDescs
>>>> + AllTargetsDisassemblers
>>>> + AllTargetsInfos
>>>> + MC
>>>> + MCDisassembler
>>>> + Support
>>>> + )
>>>> + add_llvm_tool(llvm-mc-fuzzer
>>>> + llvm-mc-fuzzer.cpp)
>>>> + target_link_libraries(llvm-mc-fuzzer
>>>> + LLVMFuzzerNoMain
>>>> + )
>>>> +endif()
>>>>
>>>> Added: llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp?rev=247786&view=auto
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp (added)
>>>> +++ llvm/trunk/tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp Wed Sep 16
>>>> 06:49:49 2015
>>>> @@ -0,0 +1,129 @@
>>>> +//===--- llvm-mc-fuzzer.cpp - Fuzzer for the MC layer
>>>> ---------------------===//
>>>> +//
>>>> +// The LLVM Compiler Infrastructure
>>>> +//
>>>> +// This file is distributed under the University of Illinois Open
>>>> Source
>>>> +// License. See LICENSE.TXT for details.
>>>> +//
>>>>
>>>> +//===----------------------------------------------------------------------===//
>>>> +//
>>>>
>>>> +//===----------------------------------------------------------------------===//
>>>> +
>>>> +#include "llvm-c/Disassembler.h"
>>>> +#include "llvm-c/Target.h"
>>>> +#include "llvm/ADT/ArrayRef.h"
>>>> +#include "llvm/MC/SubtargetFeature.h"
>>>> +#include "llvm/Support/CommandLine.h"
>>>> +#include "llvm/Support/raw_ostream.h"
>>>> +#include "FuzzerInterface.h"
>>>> +
>>>> +using namespace llvm;
>>>> +
>>>> +const unsigned AssemblyTextBufSize = 80;
>>>> +
>>>> +enum ActionType {
>>>> + AC_Assemble,
>>>> + AC_Disassemble
>>>> +};
>>>> +
>>>> +static cl::opt<ActionType>
>>>> +Action(cl::desc("Action to perform:"),
>>>> + cl::init(AC_Assemble),
>>>> + cl::values(clEnumValN(AC_Assemble, "assemble",
>>>> + "Assemble a .s file (default)"),
>>>> + clEnumValN(AC_Disassemble, "disassemble",
>>>> + "Disassemble strings of hex bytes"),
>>>> + clEnumValEnd));
>>>> +
>>>> +static cl::opt<std::string>
>>>> + TripleName("triple", cl::desc("Target triple to assemble for, "
>>>> + "see -version for available
>>>> targets"));
>>>> +
>>>> +static cl::opt<std::string>
>>>> + MCPU("mcpu",
>>>> + cl::desc("Target a specific cpu type (-mcpu=help for
>>>> details)"),
>>>> + cl::value_desc("cpu-name"), cl::init(""));
>>>> +
>>>> +static cl::list<std::string>
>>>> + MAttrs("mattr", cl::CommaSeparated,
>>>> + cl::desc("Target specific attributes (-mattr=help for
>>>> details)"),
>>>> + cl::value_desc("a1,+a2,-a3,..."));
>>>> +// The feature string derived from -mattr's values.
>>>> +std::string FeaturesStr;
>>>> +
>>>> +static cl::list<std::string>
>>>> + FuzzerArgv("fuzzer-args", cl::Positional,
>>>> + cl::desc("Options to pass to the fuzzer"),
>>>> cl::ZeroOrMore,
>>>> + cl::PositionalEatsArgs);
>>>> +
>>>> +void DisassembleOneInput(const uint8_t *Data, size_t Size) {
>>>> + char AssemblyText[AssemblyTextBufSize];
>>>> +
>>>> + std::vector<uint8_t> DataCopy(Data, Data + Size);
>>>> +
>>>> + LLVMDisasmContextRef Ctx = LLVMCreateDisasmCPUFeatures(
>>>> + TripleName.c_str(), MCPU.c_str(), FeaturesStr.c_str(), nullptr,
>>>> 0,
>>>> + nullptr, nullptr);
>>>> + assert(Ctx);
>>>> + uint8_t *p = DataCopy.data();
>>>> + unsigned Consumed;
>>>> + do {
>>>> + Consumed = LLVMDisasmInstruction(Ctx, p, Size, 0, AssemblyText,
>>>> + AssemblyTextBufSize);
>>>> + Size -= Consumed;
>>>> + p += Consumed;
>>>> + } while (Consumed != 0);
>>>> + LLVMDisasmDispose(Ctx);
>>>> +}
>>>> +
>>>> +int main(int argc, char **argv) {
>>>> + // The command line is unusual compared to other fuzzers due to the
>>>> need to
>>>> + // specify the target. Options like -triple, -mcpu, and -mattr work
>>>> like
>>>> + // their counterparts in llvm-mc, while -fuzzer-args collects
>>>> options for the
>>>> + // fuzzer itself.
>>>> + //
>>>> + // Examples:
>>>> + //
>>>> + // Fuzz the big-endian MIPS32R6 disassembler using 100,000 inputs of
>>>> up to
>>>> + // 4-bytes each and use the contents of ./corpus as the test corpus:
>>>> + // llvm-mc-fuzzer -triple mips-linux-gnu -mcpu=mips32r6
>>>> -disassemble \
>>>> + // -fuzzer-args -max_len=4 -runs=100000 ./corpus
>>>> + //
>>>> + // Infinitely fuzz the little-endian MIPS64R2 disassembler with the
>>>> MSA
>>>> + // feature enabled using up to 64-byte inputs:
>>>> + // llvm-mc-fuzzer -triple mipsel-linux-gnu -mcpu=mips64r2
>>>> -mattr=msa \
>>>> + // -disassemble -fuzzer-args ./corpus
>>>> + //
>>>> + // If your aim is to find instructions that are not tested, then it
>>>> is
>>>> + // advisable to constrain the maximum input size to a single
>>>> instruction
>>>> + // using -max_len as in the first example. This results in a test
>>>> corpus of
>>>> + // individual instructions that test unique paths. Without this
>>>> constraint,
>>>> + // there will be considerable redundancy in the corpus.
>>>> +
>>>> + LLVMInitializeAllTargetInfos();
>>>> + LLVMInitializeAllTargetMCs();
>>>> + LLVMInitializeAllDisassemblers();
>>>> +
>>>> + cl::ParseCommandLineOptions(argc, argv);
>>>> +
>>>> + // Package up features to be passed to target/subtarget
>>>> + // We have to pass it via a global since the callback doesn't
>>>> + // permit any user data.
>>>> + if (MAttrs.size()) {
>>>> + SubtargetFeatures Features;
>>>> + for (unsigned i = 0; i != MAttrs.size(); ++i)
>>>> + Features.AddFeature(MAttrs[i]);
>>>> + FeaturesStr = Features.getString();
>>>> + }
>>>> +
>>>> + // Insert the program name into the FuzzerArgv.
>>>> + FuzzerArgv.insert(FuzzerArgv.begin(), argv[0]);
>>>> +
>>>> + if (Action == AC_Assemble)
>>>> + errs() << "error: -assemble is not implemented\n";
>>>> + else if (Action == AC_Disassemble)
>>>> + return fuzzer::FuzzerDriver(FuzzerArgv, DisassembleOneInput);
>>>> +
>>>> + llvm_unreachable("Unknown action");
>>>> + return 1;
>>>> +}
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150925/29364256/attachment.html>
More information about the llvm-commits
mailing list