[LLVMdev] Debug Info and DFSan

Peter Collingbourne peter at pcc.me.uk
Tue Oct 7 14:30:45 PDT 2014


On Tue, Oct 07, 2014 at 12:20:55PM -0700, David Blaikie wrote:
> On Tue, Oct 7, 2014 at 12:18 PM, David Blaikie <dblaikie at gmail.com> wrote:
> 
> >
> >
> > On Tue, Oct 7, 2014 at 12:10 PM, David Blaikie <dblaikie at gmail.com> wrote:
> >
> >>
> >>
> >> On Tue, Oct 7, 2014 at 11:48 AM, Peter Collingbourne <peter at pcc.me.uk>
> >> wrote:
> >>
> >>> On Tue, Oct 07, 2014 at 10:04:30AM -0700, David Blaikie wrote:
> >>> > Hi Peter,
> >>> >
> >>> > After discovering several bugs in ArgumentPromotion and
> >>> > DeadArgumentElimination where llvm::Functions were replaced with
> >>> similar
> >>> > functions (with the same name) to transform their type in some way, I
> >>> > started looking at all calls to llvm::Function::takeName to see if
> >>> there
> >>> > were any other debug info quality bugs in similar callers.
> >>> >
> >>> > One such caller is the DataFlowSanitizer, and I don't see any debug
> >>> info
> >>> > tests for this so I'm wondering what /should/ happen here.
> >>> >
> >>> > Is DFSan+DebugInfo something that matters? I assume so.
> >>>
> >>> It may be important in the future, but at the moment the dfsan runtime
> >>> library
> >>> does not make use of debug info. The debug info could still be useful for
> >>> regular debugging tasks though.
> >>>
> >>
> >> Yeah - that'd be the baseline. I assume some people probably care about
> >> being able to debug a dfsan binary. Not sure if your users have
> >> specifically highlighted this situation or come across the bugs I'm
> >> speculating about.
> >>
> >>
> >>> > It looks like DFSan is creating wrappers (in/around
> >>> > DataFlowSanitizer.cpp:680-700) - when it does this, should it update
> >>> the
> >>> > debug info for these functions? Or are these internal instrumentation
> >>> > functions & nothing to do with the code the user wrote? I can't quite
> >>> tell
> >>> > from the code.
> >>>
> >>> The functions created by that part of the code replace the original
> >>> functions,
> >>> so they should inherit the debug info for those functions.
> >>>
> >>
> >> Yep, that won't happen for free - we have to stitch it back into the
> >> debug info.
> >>
> >>
> >>> But the code below that can also create wrapper functions which do not
> >>> need
> >>> debug info (lines 712-746). Wrappers normally show up for uninstrumented
> >>> functions (e.g. main and many libc functions).
> >>>
> >>> > Could you provide any C/C++ source examples whis part of DFSan fires
> >>> > reliably, so I could experiment with some examples and see how the
> >>> debug
> >>> > info looks?
> >>>
> >>> This is an example of a program that exercises the replacement function
> >>> and
> >>> wrapper features.
> >>>
> >>>
> >>> --------------------------------------------------------------------------------
> >>> #include <stddef.h>
> >>> #include <string.h>
> >>>
> >>> size_t len(size_t (*strlen_ptr)(const char *), const char *str) {
> >>>   return strlen_ptr(str);
> >>> }
> >>>
> >>> int main(void) {
> >>>   return len(strlen, "foo");
> >>> }
> >>>
> >>> --------------------------------------------------------------------------------
> >>>
> >>> In this example, 'len' is rewritten to 'dfs$len', 'main' keeps its
> >>> original
> >>> name (the pass treats it as an uninstrumented function), and wrappers are
> >>> created for 'main' and 'strlen' (the wrapper for 'main' is unused as the
> >>> C runtime calls the regular 'main' function directly).
> >>>
> >>
> >> OK, so when you say wrappers are created - where does the name go? "main"
> >> keeps the name? (it's important which way this transformation is done - if
> >> the guts of main are moved to a new function and the name moved as well,
> >> that's not the same as keeping the same function as far as debug info
> >> metadata is concerned)

In this case the function keeps its original name and the wrapper is created
separately.

> >>> I compile this with '-O0 -g'. A 'break main'/'run'/'break strlen'/'cont'
> >>> gives a relevant stack trace:
> >>>
> >>> #0  __strlen_sse2_pminub () at
> >>> ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:33
> >>> #1  0x00005555555587ff in __dfsw_strlen (s=0x55555556fe17 "foo",
> >>> s_label=<optimized out>, ret_label=0x7fffffffddee)
> >>>     at llvm/projects/compiler-rt/lib/dfsan/dfsan_custom.cc:203
> >>> #2  0x000055555556bbdc in dfsw$strlen ()
> >>> #3  0x000055555556bb51 in len (strlen_ptr=0x55555556bbc0 <dfsw$strlen>,
> >>> str=0x55555556fe17 "foo") at strlen.c:5
> >>> #4  0x000055555556bb96 in main ()
> >>>
> >>> In this stack trace, #2 is the compiler-generated wrapper function for
> >>> strlen.
> >>>
> >>> It looks like the debug info for 'len' is preserved correctly, but I
> >>> don't
> >>> know why the debug info for 'main' is missing.
> >>>
> >>
> >> Yeah - not quite sure either. I'll poke around with it for a little bit.
> >>
> >
> > DataFlowSanitizer.cpp:719: F.replaceAllUsesWith(WrappedFnCst);
> >
> > replaces the debug info metadata's pointer to main with a pointer to
> > dfsw$main (this issue doesn't come up with DAE or ArgPromo because they
> > both don
> >
> 
> ... they both don't use RAUW because they have to visit each call site to
> do special stuff anyway. (they're replacing one function with another of a
> different type, RAUW isn't suitable)
> 
> I /guess/ we never end up codegen'ing dfsw$main? (I haven't looked) and
> thus the debug info doesn't describe any function at all. It's possible
> that there's some other reason we don't end up describing any function at
> all...
> 
> In any case if we did have "main" in the debug info describe a function, at
> this point, it would be describing the dfsw$main, not main main. So we'd
> need to handle remapping the debug info back to the original uninstrumented
> function anyway, I assume? (that's the expected behavior? That the
> uninstrumented function is the one described by debug info, not the
> instrumented wrapper?)

Yes, the debug info describes 'main', not 'dfsw$main'. So I agree that
the references in the debug info would need to refer to 'main' instead of
'dfsw$main'. Not sure if there is a good way to do that. Perhaps we could
manually RAUW and skip metadata, but I'm not sure if that would work if the
debug metadata is not a direct user.

Thanks,
-- 
Peter



More information about the llvm-dev mailing list