[llvm-dev] Libfuzzer depending on uninitialized debug info

Thu Dec 1 11:08:20 PST 2016

TL;DR:  LibFuzzer appears to depend on debug-info source locations for
whatever IR instrumentation it uses; however, that instrumentation does
not have proper source locations attached to it, leading to potentially
incorrect reporting.  The short-term fix is to make sure the debug info
it needs is actually set up; the long-term fix is not to rely on debug
info, because some optimizations will (correctly) erase it.

The long version:

When Clang generates IR with debug info, one thing it does is attach a
source location to most IR instructions.  This source location (at least
in principle) is carried through optimizations, SelectionDAG, MachineIR,
assembler source, and ultimately ends up in the "line table" in the
object file.  The line table describes a mapping from the virtual
addresses of instructions to source locations, which is very useful to
debuggers and other tools.

Not all IR instructions have a source location attached to them.  When
that happens, no specific line-table record is emitted for any machine
instruction produced from that IR instruction.  In DWARF, that means you
assume the instruction belongs to the same source location as the
instruction that precedes it in memory.

This is a problem when the first instruction in a machine-basic-block has
no explicit source location, because it implicitly inherits the source
location of the last instruction of the basic block that precedes it in 
memory.  That means, the source location is entirely at the mercy of 
block layout and other optimizations.

In effect, the source location for that instruction is UNINITIALIZED.

In r288283, I committed a patch that explicitly initialized the line
number for some instructions to line 0.  The DWARF spec says that line 0
means there is no specific source location for the instruction. Debuggers
and other tools generally respond to this looking *forward* in the 
instruction stream to find the *next* instruction with an explicit non-0
location, rather than backward to the *previous* instruction with an 
explicit location.

This caused a libFuzzer test to fail, because it depended on seeing a
real source location for something, and got line 0 instead.  This tells
me libFuzzer is depending on an uninitialized source location.  Kostya
backed out that patch for me, but we really want to have it for improved 
debugger single-stepping behavior.

I am unclear on what instrumentation the fuzzer is using, although the
instructions for building it suggest it's ASAN instrumentation. Whatever 
it is, either the instrumentation should use its own source-location 
information scheme, or it should initialize the debug info that it is 
depending on.

Note that debug info is not necessarily reliable in the face of
optimization.  If two blocks with different source locations get merged, 
most likely the source location will be zeroed (and that's not my patch, 
that's optimization-specific behavior).  Therefore, I would recommend 
that fuzzer/asan/whoever stop relying on debug info for source locations,
if we want all that to work on optimized code.

In the short term it's probably easier to find places where the
instrumentation is missing debug info, and add it.  But that's not going 
to be reliable for optimized code.
--paulr