[LLVMdev] Debug Info and DFSan

David Blaikie dblaikie at gmail.com
Tue Oct 7 14:35:49 PDT 2014


Here's a basic patch which would solve it in sort of the same way as the
other optimizations I was fixing (just special case the debug info & fix it
up). I can work up a test case for this as well, or you can, if you
like/this seems reasonable.

On Tue, Oct 7, 2014 at 2:30 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:

> On Tue, Oct 07, 2014 at 12:20:55PM -0700, David Blaikie wrote:
> > On Tue, Oct 7, 2014 at 12:18 PM, David Blaikie <dblaikie at gmail.com>
> wrote:
> >
> > >
> > >
> > > On Tue, Oct 7, 2014 at 12:10 PM, David Blaikie <dblaikie at gmail.com>
> wrote:
> > >
> > >>
> > >>
> > >> On Tue, Oct 7, 2014 at 11:48 AM, Peter Collingbourne <peter at pcc.me.uk
> >
> > >> wrote:
> > >>
> > >>> On Tue, Oct 07, 2014 at 10:04:30AM -0700, David Blaikie wrote:
> > >>> > Hi Peter,
> > >>> >
> > >>> > After discovering several bugs in ArgumentPromotion and
> > >>> > DeadArgumentElimination where llvm::Functions were replaced with
> > >>> similar
> > >>> > functions (with the same name) to transform their type in some
> way, I
> > >>> > started looking at all calls to llvm::Function::takeName to see if
> > >>> there
> > >>> > were any other debug info quality bugs in similar callers.
> > >>> >
> > >>> > One such caller is the DataFlowSanitizer, and I don't see any debug
> > >>> info
> > >>> > tests for this so I'm wondering what /should/ happen here.
> > >>> >
> > >>> > Is DFSan+DebugInfo something that matters? I assume so.
> > >>>
> > >>> It may be important in the future, but at the moment the dfsan
> runtime
> > >>> library
> > >>> does not make use of debug info. The debug info could still be
> useful for
> > >>> regular debugging tasks though.
> > >>>
> > >>
> > >> Yeah - that'd be the baseline. I assume some people probably care
> about
> > >> being able to debug a dfsan binary. Not sure if your users have
> > >> specifically highlighted this situation or come across the bugs I'm
> > >> speculating about.
> > >>
> > >>
> > >>> > It looks like DFSan is creating wrappers (in/around
> > >>> > DataFlowSanitizer.cpp:680-700) - when it does this, should it
> update
> > >>> the
> > >>> > debug info for these functions? Or are these internal
> instrumentation
> > >>> > functions & nothing to do with the code the user wrote? I can't
> quite
> > >>> tell
> > >>> > from the code.
> > >>>
> > >>> The functions created by that part of the code replace the original
> > >>> functions,
> > >>> so they should inherit the debug info for those functions.
> > >>>
> > >>
> > >> Yep, that won't happen for free - we have to stitch it back into the
> > >> debug info.
> > >>
> > >>
> > >>> But the code below that can also create wrapper functions which do
> not
> > >>> need
> > >>> debug info (lines 712-746). Wrappers normally show up for
> uninstrumented
> > >>> functions (e.g. main and many libc functions).
> > >>>
> > >>> > Could you provide any C/C++ source examples whis part of DFSan
> fires
> > >>> > reliably, so I could experiment with some examples and see how the
> > >>> debug
> > >>> > info looks?
> > >>>
> > >>> This is an example of a program that exercises the replacement
> function
> > >>> and
> > >>> wrapper features.
> > >>>
> > >>>
> > >>>
> --------------------------------------------------------------------------------
> > >>> #include <stddef.h>
> > >>> #include <string.h>
> > >>>
> > >>> size_t len(size_t (*strlen_ptr)(const char *), const char *str) {
> > >>>   return strlen_ptr(str);
> > >>> }
> > >>>
> > >>> int main(void) {
> > >>>   return len(strlen, "foo");
> > >>> }
> > >>>
> > >>>
> --------------------------------------------------------------------------------
> > >>>
> > >>> In this example, 'len' is rewritten to 'dfs$len', 'main' keeps its
> > >>> original
> > >>> name (the pass treats it as an uninstrumented function), and
> wrappers are
> > >>> created for 'main' and 'strlen' (the wrapper for 'main' is unused as
> the
> > >>> C runtime calls the regular 'main' function directly).
> > >>>
> > >>
> > >> OK, so when you say wrappers are created - where does the name go?
> "main"
> > >> keeps the name? (it's important which way this transformation is done
> - if
> > >> the guts of main are moved to a new function and the name moved as
> well,
> > >> that's not the same as keeping the same function as far as debug info
> > >> metadata is concerned)
>
> In this case the function keeps its original name and the wrapper is
> created
> separately.
>
> > >>> I compile this with '-O0 -g'. A 'break main'/'run'/'break
> strlen'/'cont'
> > >>> gives a relevant stack trace:
> > >>>
> > >>> #0  __strlen_sse2_pminub () at
> > >>> ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:33
> > >>> #1  0x00005555555587ff in __dfsw_strlen (s=0x55555556fe17 "foo",
> > >>> s_label=<optimized out>, ret_label=0x7fffffffddee)
> > >>>     at llvm/projects/compiler-rt/lib/dfsan/dfsan_custom.cc:203
> > >>> #2  0x000055555556bbdc in dfsw$strlen ()
> > >>> #3  0x000055555556bb51 in len (strlen_ptr=0x55555556bbc0
> <dfsw$strlen>,
> > >>> str=0x55555556fe17 "foo") at strlen.c:5
> > >>> #4  0x000055555556bb96 in main ()
> > >>>
> > >>> In this stack trace, #2 is the compiler-generated wrapper function
> for
> > >>> strlen.
> > >>>
> > >>> It looks like the debug info for 'len' is preserved correctly, but I
> > >>> don't
> > >>> know why the debug info for 'main' is missing.
> > >>>
> > >>
> > >> Yeah - not quite sure either. I'll poke around with it for a little
> bit.
> > >>
> > >
> > > DataFlowSanitizer.cpp:719: F.replaceAllUsesWith(WrappedFnCst);
> > >
> > > replaces the debug info metadata's pointer to main with a pointer to
> > > dfsw$main (this issue doesn't come up with DAE or ArgPromo because they
> > > both don
> > >
> >
> > ... they both don't use RAUW because they have to visit each call site to
> > do special stuff anyway. (they're replacing one function with another of
> a
> > different type, RAUW isn't suitable)
> >
> > I /guess/ we never end up codegen'ing dfsw$main? (I haven't looked) and
> > thus the debug info doesn't describe any function at all. It's possible
> > that there's some other reason we don't end up describing any function at
> > all...
> >
> > In any case if we did have "main" in the debug info describe a function,
> at
> > this point, it would be describing the dfsw$main, not main main. So we'd
> > need to handle remapping the debug info back to the original
> uninstrumented
> > function anyway, I assume? (that's the expected behavior? That the
> > uninstrumented function is the one described by debug info, not the
> > instrumented wrapper?)
>
> Yes, the debug info describes 'main', not 'dfsw$main'. So I agree that
> the references in the debug info would need to refer to 'main' instead of
> 'dfsw$main'. Not sure if there is a good way to do that. Perhaps we could
> manually RAUW and skip metadata, but I'm not sure if that would work if the
> debug metadata is not a direct user.
>
> Thanks,
> --
> Peter
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141007/0673c94d/attachment.html>
-------------- next part --------------
diff --git lib/Transforms/Instrumentation/DataFlowSanitizer.cpp lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
index 446bcf7..cb84fd6 100644
--- lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
+++ lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
@@ -51,6 +51,7 @@
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/Dominators.h"
+#include "llvm/IR/DebugInfo.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/InlineAsm.h"
 #include "llvm/IR/InstVisitor.h"
@@ -243,6 +244,7 @@ class DataFlowSanitizer : public ModulePass {
   DFSanABIList ABIList;
   DenseMap<Value *, Function *> UnwrappedFnMap;
   AttributeSet ReadOnlyNoneAttrs;
+  DenseMap<const Function *, DISubprogram> FunctionDIs;
 
   Value *getShadowAddress(Value *Addr, Instruction *Pos);
   bool isInstrumented(const Function *F);
@@ -573,6 +575,8 @@ bool DataFlowSanitizer::runOnModule(Module &M) {
   if (ABIList.isIn(M, "skip"))
     return false;
 
+  FunctionDIs = makeSubprogramMap(M);
+
   if (!GetArgTLSPtr) {
     Type *ArgTLSTy = ArrayType::get(ShadowTy, 64);
     ArgTLS = Mod->getOrInsertGlobal("__dfsan_arg_tls", ArgTLSTy);
@@ -725,6 +729,12 @@ bool DataFlowSanitizer::runOnModule(Module &M) {
       Value *WrappedFnCst =
           ConstantExpr::getBitCast(NewF, PointerType::getUnqual(FT));
       F.replaceAllUsesWith(WrappedFnCst);
+
+      // Patch the pointer to LLVM function in debug info descriptor.
+      auto DI = FunctionDIs.find(&F);
+      if (DI != FunctionDIs.end())
+        DI->second.replaceFunction(&F);
+
       UnwrappedFnMap[WrappedFnCst] = &F;
       *i = NewF;
 


More information about the llvm-dev mailing list