[llvm-dev] Libfuzzer depending on uninitialized debug info

Kostya Serebryany via llvm-dev llvm-dev at lists.llvm.org
Fri Dec 2 17:54:54 PST 2016


On Fri, Dec 2, 2016 at 5:42 PM, Robinson, Paul <paul.robinson at sony.com>
wrote:

> I've determined that the "pesky" .loc is indeed because of the .cfi
> directive that comes immediately after it.  Some of the CFI instructions
> have source locations, some don't.  But, emitting a source location for a
> CFI instruction is inappropriate.  It's easy enough to ignore them.
>
>
>
> I propose we do 4 things: (1) commit the patch in SanitizerCoverage.cpp
> that you found;
>

done r288568.


> (2) cause CFI instructions not to emit any .loc directives; (3) file a bug
> to have someone audit LoopVectorizer.cpp to see whether it is using
> SetCurrentDebugLocation in the right places; (4) reapply my "line 0" patch,
> which will be the 3rd attempt.
>

Please ping me when you do (4).
Also, will there be a flag to disable this new functionality?


>
>
> I can do all of these if you like, or you can do the first one and I'll do
> the others.  I will continue with this on Monday.
>
> Thanks,
>
> --paulr
>
>
>
> *From:* Robinson, Paul
> *Sent:* Friday, December 02, 2016 9:39 AM
> *To:* Robinson, Paul; Kostya Serebryany
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* RE: [llvm-dev] Libfuzzer depending on uninitialized debug info
>
>
>
> I looked through all the places that call SetCurrentDebugLocation().
> Aside from the one place you already found, there are some
> suspicious-looking sequences in LoopVectorizer.cpp.  Other than that they
> look okay to me.
>
>
>
> It turns out that `SetInsertPoint(Instruction *I)` automatically does
> `SetCurrentDebugLocation(I->getDebugLoc())` so the problem arises when
> you don't want the same debug location as the insertion point.  And the
> IRBuilder ctor that takes an Instruction* does SetInsertPoint(I) so some
> places are calling SetCurrentDebugLocation redundantly, but that's not
> harmful functionally.
>
>
>
> I'll play with the CFI stuff later today.
>
> --paulr
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
> <llvm-dev-bounces at lists.llvm.org>] *On Behalf Of *Robinson, Paul via
> llvm-dev
> *Sent:* Thursday, December 01, 2016 6:29 PM
> *To:* Kostya Serebryany
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] Libfuzzer depending on uninitialized debug info
>
>
>
> Hmmm that is a funny sequence.  I know the .cfi directives are represented
> as pseudo-instructions, but they should not be causing us to emit .loc
> directives.  They have no effect on the .text section so probably they
> should just be excluded from emitting a location, same as DBG_VALUE is
> excluded.  Also I believe the label there is unnecessary, but that's a
> separate issue.
>
>
>
> Regarding "how do we find those problems" this is like "how do we find all
> the bugs" and what we can do is come up with intelligent approaches to
> finding where they are likely to hide.  For example, one possibility is to
> audit all the places that call SetCurrentDebugLocation; my grep through
> llvm/lib found 43 instances, which is not horrible.  We can make sure that
> the SetInsertPoint/SetCurrentDebugLocation sequence is correct in all
> those places.  If we can identify components that do depend on the debug
> line table (like fuzzer and sanitizers) then running a bunch of their tests
> with –use-unknown-locations turned on by default might also help, after we
> address the .cfi thing.
>
>
>
> I can look into better handling of .cfi instructions and also do the
> SetCurrentDebugLocation audit tomorrow.
>
> --paulr
>
>
>
> *From:* Kostya Serebryany [mailto:kcc at google.com <kcc at google.com>]
> *Sent:* Thursday, December 01, 2016 5:01 PM
> *To:* Robinson, Paul
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] Libfuzzer depending on uninitialized debug info
>
>
>
> Ok...
>
>
>
> The particular instance of the problem can be solved with this patch in my
> code:
>
>
>
> +      IRB.SetInsertPoint(Ins);
>
>        IRB.SetCurrentDebugLocation(EntryLoc);
>
> -      IRB.SetInsertPoint(Ins);
>
>
>
> (apparently, SetInsertPoint invalidates the previous call to
> SetCurrentDebugLocation)
>
>
>
> But then there is another problem....
>
>
>
> % cat dummy.c
>
> void foo() {}
>
>
>
> % clang -O -c -gmlt   -fsanitize-coverage=func,trace-pc-guard  -S dummy.c
> -o -
>
> .LBB0_1:
>
>         .loc    1 1 0                   # dummy.c:1:0
>
>         pushq   %rax
>
> .Lcfi0:
>
>         .cfi_def_cfa_offset 16
>
>         movl    $.L__sancov_gen_, %edi
>
>         callq   __sanitizer_cov_trace_pc_guard
>
>
>
> % clang -O -c -gmlt   -fsanitize-coverage=func,trace-pc-guard  -S dummy.c
> -mllvm -use-unknown-locations -o -
>
>
>
> .LBB0_1:
>
>         .loc    1 1 0 is_stmt 0         # dummy.c:1:0
>
>         pushq   %rax
>
> *        .loc    1 0 0                   # :0:0*
>
> .Lcfi0:
>
>         .cfi_def_cfa_offset 16
>
>         .loc    1 1 0 is_stmt 1         # dummy.c:1:0
>
>         movl    $.L__sancov_gen_, %edi
>
>         callq   __sanitizer_cov_trace_pc_guard
>
>
>
>
>
> Then, when I addr2line the resulting binary some of the instructions get
> this pesky "*.loc    1 0 0*" for some reason (did not investigate yet)
>
>
>
> I am pretty sure that every particular problem like this can be solved
> with a simple patch,
>
> but how do we find those problems before the users get upset enough to
> file a good bug report?
>
>
>
>
>
> --kcc
>
>
>
>
>
>
>
>
>
> On Thu, Dec 1, 2016 at 4:16 PM, Robinson, Paul <paul.robinson at sony.com>
> wrote:
>
> There is already –mllvm –use-unknown-locations which ought to trigger
> this.  Don't need my patch.
>
> --paulr
>
>
>
> *From:* Kostya Serebryany [mailto:kcc at google.com]
> *Sent:* Thursday, December 01, 2016 4:08 PM
>
>
> *To:* Robinson, Paul
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] Libfuzzer depending on uninitialized debug info
>
>
>
>
>
>
>
> On Thu, Dec 1, 2016 at 3:37 PM, Robinson, Paul <paul.robinson at sony.com>
> wrote:
>
> It might be a wider problem than libfuzzer.  I did want to raise the
> problem asap and libfuzzer is something we know has the problem.
>
> If it came across as "libfuzzer is evil" that was not my intent, sorry!
>
> No, no, I did not mean you implied that :)
>
> Just wanted to make sure everyone understand that this is not
> libFuzzer-specific.
>
>
>
> Looking at lib/Transforms/Instrumentation/SanitizerCoverage.cpp:
>
>   DebugLoc EntryLoc;
>
>   if (IsEntryBB) {
>
>     if (auto SP = F.getSubprogram())
>
>       EntryLoc = DebugLoc::get(SP->getScopeLine(), 0, SP);
>
> ...
>
>   } else {
>
>     EntryLoc = IP->getDebugLoc();
>
>   }
>
>   IRBuilder<> IRB(&*IP);
>
>   IRB.SetCurrentDebugLocation(EntryLoc);
>
>
>
> So, using this I assumed that the newly generated instructions have proper
> debug info,
>
> and so far it worked.
>
>
>
> I wonder if you can re-commit your changes under a flag, off-by default,
> so that everyone interested can play with it?
>
>
>
>
>
> --paulr
>
>
>
> *From:* Kostya Serebryany [mailto:kcc at google.com]
> *Sent:* Thursday, December 01, 2016 2:53 PM
> *To:* Robinson, Paul
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] Libfuzzer depending on uninitialized debug info
>
>
>
>
>
>
>
> On Thu, Dec 1, 2016 at 11:08 AM, Robinson, Paul via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> TL;DR:  LibFuzzer appears to depend on debug-info source locations for
> whatever IR instrumentation it uses; however, that instrumentation does
> not have proper source locations attached to it, leading to potentially
> incorrect reporting.  The short-term fix is to make sure the debug info
> it needs is actually set up; the long-term fix is not to rely on debug
> info, because some optimizations will (correctly) erase it.
>
>
>
>
>
> Why is this libFuzzer-specific?
>
> We were just [un]lucky to detect the problem early with one of the
> libFuzzer
>
> tests that required debug info.
>
>
>
> Any tool that needs debug info will suffer from the same problem. No?
>
>
>
>
>
>
> The long version:
>
> When Clang generates IR with debug info, one thing it does is attach a
> source location to most IR instructions.  This source location (at least
> in principle) is carried through optimizations, SelectionDAG, MachineIR,
> assembler source, and ultimately ends up in the "line table" in the
> object file.  The line table describes a mapping from the virtual
> addresses of instructions to source locations, which is very useful to
> debuggers and other tools.
>
> Not all IR instructions have a source location attached to them.  When
> that happens, no specific line-table record is emitted for any machine
> instruction produced from that IR instruction.  In DWARF, that means you
> assume the instruction belongs to the same source location as the
> instruction that precedes it in memory.
>
> This is a problem when the first instruction in a machine-basic-block has
> no explicit source location, because it implicitly inherits the source
> location of the last instruction of the basic block that precedes it in
> memory.  That means, the source location is entirely at the mercy of
> block layout and other optimizations.
>
> In effect, the source location for that instruction is UNINITIALIZED.
>
> In r288283, I committed a patch that explicitly initialized the line
> number for some instructions to line 0.  The DWARF spec says that line 0
> means there is no specific source location for the instruction. Debuggers
> and other tools generally respond to this looking *forward* in the
> instruction stream to find the *next* instruction with an explicit non-0
> location, rather than backward to the *previous* instruction with an
> explicit location.
>
> This caused a libFuzzer test to fail, because it depended on seeing a
> real source location for something, and got line 0 instead.  This tells
> me libFuzzer is depending on an uninitialized source location.  Kostya
> backed out that patch for me, but we really want to have it for improved
> debugger single-stepping behavior.
>
> I am unclear on what instrumentation the fuzzer is using, although the
> instructions for building it suggest it's ASAN instrumentation. Whatever
> it is, either the instrumentation should use its own source-location
> information scheme, or it should initialize the debug info that it is
> depending on.
>
> Note that debug info is not necessarily reliable in the face of
> optimization.  If two blocks with different source locations get merged,
> most likely the source location will be zeroed (and that's not my patch,
> that's optimization-specific behavior).  Therefore, I would recommend
> that fuzzer/asan/whoever stop relying on debug info for source locations,
> if we want all that to work on optimized code.
>
> In the short term it's probably easier to find places where the
> instrumentation is missing debug info, and add it.  But that's not going
> to be reliable for optimized code.
> --paulr
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161202/00698865/attachment.html>


More information about the llvm-dev mailing list