[LLVMdev] Debug Info and DFSan

David Blaikie dblaikie at gmail.com
Tue Oct 7 12:20:55 PDT 2014


On Tue, Oct 7, 2014 at 12:18 PM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Tue, Oct 7, 2014 at 12:10 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Tue, Oct 7, 2014 at 11:48 AM, Peter Collingbourne <peter at pcc.me.uk>
>> wrote:
>>
>>> On Tue, Oct 07, 2014 at 10:04:30AM -0700, David Blaikie wrote:
>>> > Hi Peter,
>>> >
>>> > After discovering several bugs in ArgumentPromotion and
>>> > DeadArgumentElimination where llvm::Functions were replaced with
>>> similar
>>> > functions (with the same name) to transform their type in some way, I
>>> > started looking at all calls to llvm::Function::takeName to see if
>>> there
>>> > were any other debug info quality bugs in similar callers.
>>> >
>>> > One such caller is the DataFlowSanitizer, and I don't see any debug
>>> info
>>> > tests for this so I'm wondering what /should/ happen here.
>>> >
>>> > Is DFSan+DebugInfo something that matters? I assume so.
>>>
>>> It may be important in the future, but at the moment the dfsan runtime
>>> library
>>> does not make use of debug info. The debug info could still be useful for
>>> regular debugging tasks though.
>>>
>>
>> Yeah - that'd be the baseline. I assume some people probably care about
>> being able to debug a dfsan binary. Not sure if your users have
>> specifically highlighted this situation or come across the bugs I'm
>> speculating about.
>>
>>
>>> > It looks like DFSan is creating wrappers (in/around
>>> > DataFlowSanitizer.cpp:680-700) - when it does this, should it update
>>> the
>>> > debug info for these functions? Or are these internal instrumentation
>>> > functions & nothing to do with the code the user wrote? I can't quite
>>> tell
>>> > from the code.
>>>
>>> The functions created by that part of the code replace the original
>>> functions,
>>> so they should inherit the debug info for those functions.
>>>
>>
>> Yep, that won't happen for free - we have to stitch it back into the
>> debug info.
>>
>>
>>> But the code below that can also create wrapper functions which do not
>>> need
>>> debug info (lines 712-746). Wrappers normally show up for uninstrumented
>>> functions (e.g. main and many libc functions).
>>>
>>> > Could you provide any C/C++ source examples whis part of DFSan fires
>>> > reliably, so I could experiment with some examples and see how the
>>> debug
>>> > info looks?
>>>
>>> This is an example of a program that exercises the replacement function
>>> and
>>> wrapper features.
>>>
>>>
>>> --------------------------------------------------------------------------------
>>> #include <stddef.h>
>>> #include <string.h>
>>>
>>> size_t len(size_t (*strlen_ptr)(const char *), const char *str) {
>>>   return strlen_ptr(str);
>>> }
>>>
>>> int main(void) {
>>>   return len(strlen, "foo");
>>> }
>>>
>>> --------------------------------------------------------------------------------
>>>
>>> In this example, 'len' is rewritten to 'dfs$len', 'main' keeps its
>>> original
>>> name (the pass treats it as an uninstrumented function), and wrappers are
>>> created for 'main' and 'strlen' (the wrapper for 'main' is unused as the
>>> C runtime calls the regular 'main' function directly).
>>>
>>
>> OK, so when you say wrappers are created - where does the name go? "main"
>> keeps the name? (it's important which way this transformation is done - if
>> the guts of main are moved to a new function and the name moved as well,
>> that's not the same as keeping the same function as far as debug info
>> metadata is concerned)
>>
>>
>>> I compile this with '-O0 -g'. A 'break main'/'run'/'break strlen'/'cont'
>>> gives a relevant stack trace:
>>>
>>> #0  __strlen_sse2_pminub () at
>>> ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:33
>>> #1  0x00005555555587ff in __dfsw_strlen (s=0x55555556fe17 "foo",
>>> s_label=<optimized out>, ret_label=0x7fffffffddee)
>>>     at llvm/projects/compiler-rt/lib/dfsan/dfsan_custom.cc:203
>>> #2  0x000055555556bbdc in dfsw$strlen ()
>>> #3  0x000055555556bb51 in len (strlen_ptr=0x55555556bbc0 <dfsw$strlen>,
>>> str=0x55555556fe17 "foo") at strlen.c:5
>>> #4  0x000055555556bb96 in main ()
>>>
>>> In this stack trace, #2 is the compiler-generated wrapper function for
>>> strlen.
>>>
>>> It looks like the debug info for 'len' is preserved correctly, but I
>>> don't
>>> know why the debug info for 'main' is missing.
>>>
>>
>> Yeah - not quite sure either. I'll poke around with it for a little bit.
>>
>
> DataFlowSanitizer.cpp:719: F.replaceAllUsesWith(WrappedFnCst);
>
> replaces the debug info metadata's pointer to main with a pointer to
> dfsw$main (this issue doesn't come up with DAE or ArgPromo because they
> both don
>

... they both don't use RAUW because they have to visit each call site to
do special stuff anyway. (they're replacing one function with another of a
different type, RAUW isn't suitable)

I /guess/ we never end up codegen'ing dfsw$main? (I haven't looked) and
thus the debug info doesn't describe any function at all. It's possible
that there's some other reason we don't end up describing any function at
all...

In any case if we did have "main" in the debug info describe a function, at
this point, it would be describing the dfsw$main, not main main. So we'd
need to handle remapping the debug info back to the original uninstrumented
function anyway, I assume? (that's the expected behavior? That the
uninstrumented function is the one described by debug info, not the
instrumented wrapper?)



>
>
>>
>>
>>>
>>> Thanks,
>>> --
>>> Peter
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141007/44a7a791/attachment.html>


More information about the llvm-dev mailing list