[LLVMdev] Debug Info and DFSan

Peter Collingbourne peter at pcc.me.uk
Tue Oct 7 15:41:15 PDT 2014


On Tue, Oct 07, 2014 at 03:30:49PM -0700, David Blaikie wrote:
> On Tue, Oct 7, 2014 at 2:51 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
> 
> > Looks good, thanks!
> >
> > Can you write the test case, please? You probably have more experience
> > writing debug info tests than I do.
> >
> 
> Sure - though how would I get the pre-dfsan .ll file to produce this
> behavior? I've tried compiling to a .ll file without dfsan, then feeling
> that .ll through opt -dfsan, and I got different output. Instead of
> wrapping main, it just replaced it. Is there some attribute or other thing
> the frontend is adding to ensure main is wrapped, rather than converted?

The pass takes an ABI list file, which in this case needs to contain the
following two entries in order to tell the pass that main is uninstrumented:

fun:main=uninstrumented
fun:main=discard

Please take a look at the test/Instrumentation/DataFlowSanitizer/abilist.ll
test case for an example of how to specify an ABI list (you only need the
-dfsan-abilist flag).

Peter

> 
> 
> >
> > On Tue, Oct 07, 2014 at 02:35:49PM -0700, David Blaikie wrote:
> > > Here's a basic patch which would solve it in sort of the same way as the
> > > other optimizations I was fixing (just special case the debug info & fix
> > it
> > > up). I can work up a test case for this as well, or you can, if you
> > > like/this seems reasonable.
> > >
> > > On Tue, Oct 7, 2014 at 2:30 PM, Peter Collingbourne <peter at pcc.me.uk>
> > wrote:
> > >
> > > > On Tue, Oct 07, 2014 at 12:20:55PM -0700, David Blaikie wrote:
> > > > > On Tue, Oct 7, 2014 at 12:18 PM, David Blaikie <dblaikie at gmail.com>
> > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Oct 7, 2014 at 12:10 PM, David Blaikie <dblaikie at gmail.com
> > >
> > > > wrote:
> > > > > >
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Oct 7, 2014 at 11:48 AM, Peter Collingbourne <
> > peter at pcc.me.uk
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> On Tue, Oct 07, 2014 at 10:04:30AM -0700, David Blaikie wrote:
> > > > > >>> > Hi Peter,
> > > > > >>> >
> > > > > >>> > After discovering several bugs in ArgumentPromotion and
> > > > > >>> > DeadArgumentElimination where llvm::Functions were replaced
> > with
> > > > > >>> similar
> > > > > >>> > functions (with the same name) to transform their type in some
> > > > way, I
> > > > > >>> > started looking at all calls to llvm::Function::takeName to
> > see if
> > > > > >>> there
> > > > > >>> > were any other debug info quality bugs in similar callers.
> > > > > >>> >
> > > > > >>> > One such caller is the DataFlowSanitizer, and I don't see any
> > debug
> > > > > >>> info
> > > > > >>> > tests for this so I'm wondering what /should/ happen here.
> > > > > >>> >
> > > > > >>> > Is DFSan+DebugInfo something that matters? I assume so.
> > > > > >>>
> > > > > >>> It may be important in the future, but at the moment the dfsan
> > > > runtime
> > > > > >>> library
> > > > > >>> does not make use of debug info. The debug info could still be
> > > > useful for
> > > > > >>> regular debugging tasks though.
> > > > > >>>
> > > > > >>
> > > > > >> Yeah - that'd be the baseline. I assume some people probably care
> > > > about
> > > > > >> being able to debug a dfsan binary. Not sure if your users have
> > > > > >> specifically highlighted this situation or come across the bugs
> > I'm
> > > > > >> speculating about.
> > > > > >>
> > > > > >>
> > > > > >>> > It looks like DFSan is creating wrappers (in/around
> > > > > >>> > DataFlowSanitizer.cpp:680-700) - when it does this, should it
> > > > update
> > > > > >>> the
> > > > > >>> > debug info for these functions? Or are these internal
> > > > instrumentation
> > > > > >>> > functions & nothing to do with the code the user wrote? I can't
> > > > quite
> > > > > >>> tell
> > > > > >>> > from the code.
> > > > > >>>
> > > > > >>> The functions created by that part of the code replace the
> > original
> > > > > >>> functions,
> > > > > >>> so they should inherit the debug info for those functions.
> > > > > >>>
> > > > > >>
> > > > > >> Yep, that won't happen for free - we have to stitch it back into
> > the
> > > > > >> debug info.
> > > > > >>
> > > > > >>
> > > > > >>> But the code below that can also create wrapper functions which
> > do
> > > > not
> > > > > >>> need
> > > > > >>> debug info (lines 712-746). Wrappers normally show up for
> > > > uninstrumented
> > > > > >>> functions (e.g. main and many libc functions).
> > > > > >>>
> > > > > >>> > Could you provide any C/C++ source examples whis part of DFSan
> > > > fires
> > > > > >>> > reliably, so I could experiment with some examples and see how
> > the
> > > > > >>> debug
> > > > > >>> > info looks?
> > > > > >>>
> > > > > >>> This is an example of a program that exercises the replacement
> > > > function
> > > > > >>> and
> > > > > >>> wrapper features.
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > >
> > --------------------------------------------------------------------------------
> > > > > >>> #include <stddef.h>
> > > > > >>> #include <string.h>
> > > > > >>>
> > > > > >>> size_t len(size_t (*strlen_ptr)(const char *), const char *str) {
> > > > > >>>   return strlen_ptr(str);
> > > > > >>> }
> > > > > >>>
> > > > > >>> int main(void) {
> > > > > >>>   return len(strlen, "foo");
> > > > > >>> }
> > > > > >>>
> > > > > >>>
> > > >
> > --------------------------------------------------------------------------------
> > > > > >>>
> > > > > >>> In this example, 'len' is rewritten to 'dfs$len', 'main' keeps
> > its
> > > > > >>> original
> > > > > >>> name (the pass treats it as an uninstrumented function), and
> > > > wrappers are
> > > > > >>> created for 'main' and 'strlen' (the wrapper for 'main' is
> > unused as
> > > > the
> > > > > >>> C runtime calls the regular 'main' function directly).
> > > > > >>>
> > > > > >>
> > > > > >> OK, so when you say wrappers are created - where does the name go?
> > > > "main"
> > > > > >> keeps the name? (it's important which way this transformation is
> > done
> > > > - if
> > > > > >> the guts of main are moved to a new function and the name moved as
> > > > well,
> > > > > >> that's not the same as keeping the same function as far as debug
> > info
> > > > > >> metadata is concerned)
> > > >
> > > > In this case the function keeps its original name and the wrapper is
> > > > created
> > > > separately.
> > > >
> > > > > >>> I compile this with '-O0 -g'. A 'break main'/'run'/'break
> > > > strlen'/'cont'
> > > > > >>> gives a relevant stack trace:
> > > > > >>>
> > > > > >>> #0  __strlen_sse2_pminub () at
> > > > > >>> ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:33
> > > > > >>> #1  0x00005555555587ff in __dfsw_strlen (s=0x55555556fe17 "foo",
> > > > > >>> s_label=<optimized out>, ret_label=0x7fffffffddee)
> > > > > >>>     at llvm/projects/compiler-rt/lib/dfsan/dfsan_custom.cc:203
> > > > > >>> #2  0x000055555556bbdc in dfsw$strlen ()
> > > > > >>> #3  0x000055555556bb51 in len (strlen_ptr=0x55555556bbc0
> > > > <dfsw$strlen>,
> > > > > >>> str=0x55555556fe17 "foo") at strlen.c:5
> > > > > >>> #4  0x000055555556bb96 in main ()
> > > > > >>>
> > > > > >>> In this stack trace, #2 is the compiler-generated wrapper
> > function
> > > > for
> > > > > >>> strlen.
> > > > > >>>
> > > > > >>> It looks like the debug info for 'len' is preserved correctly,
> > but I
> > > > > >>> don't
> > > > > >>> know why the debug info for 'main' is missing.
> > > > > >>>
> > > > > >>
> > > > > >> Yeah - not quite sure either. I'll poke around with it for a
> > little
> > > > bit.
> > > > > >>
> > > > > >
> > > > > > DataFlowSanitizer.cpp:719: F.replaceAllUsesWith(WrappedFnCst);
> > > > > >
> > > > > > replaces the debug info metadata's pointer to main with a pointer
> > to
> > > > > > dfsw$main (this issue doesn't come up with DAE or ArgPromo because
> > they
> > > > > > both don
> > > > > >
> > > > >
> > > > > ... they both don't use RAUW because they have to visit each call
> > site to
> > > > > do special stuff anyway. (they're replacing one function with
> > another of
> > > > a
> > > > > different type, RAUW isn't suitable)
> > > > >
> > > > > I /guess/ we never end up codegen'ing dfsw$main? (I haven't looked)
> > and
> > > > > thus the debug info doesn't describe any function at all. It's
> > possible
> > > > > that there's some other reason we don't end up describing any
> > function at
> > > > > all...
> > > > >
> > > > > In any case if we did have "main" in the debug info describe a
> > function,
> > > > at
> > > > > this point, it would be describing the dfsw$main, not main main. So
> > we'd
> > > > > need to handle remapping the debug info back to the original
> > > > uninstrumented
> > > > > function anyway, I assume? (that's the expected behavior? That the
> > > > > uninstrumented function is the one described by debug info, not the
> > > > > instrumented wrapper?)
> > > >
> > > > Yes, the debug info describes 'main', not 'dfsw$main'. So I agree that
> > > > the references in the debug info would need to refer to 'main' instead
> > of
> > > > 'dfsw$main'. Not sure if there is a good way to do that. Perhaps we
> > could
> > > > manually RAUW and skip metadata, but I'm not sure if that would work
> > if the
> > > > debug metadata is not a direct user.
> > > >
> > > > Thanks,
> > > > --
> > > > Peter
> > > >
> >
> > > diff --git lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
> > lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
> > > index 446bcf7..cb84fd6 100644
> > > --- lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
> > > +++ lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
> > > @@ -51,6 +51,7 @@
> > >  #include "llvm/ADT/StringExtras.h"
> > >  #include "llvm/Analysis/ValueTracking.h"
> > >  #include "llvm/IR/Dominators.h"
> > > +#include "llvm/IR/DebugInfo.h"
> > >  #include "llvm/IR/IRBuilder.h"
> > >  #include "llvm/IR/InlineAsm.h"
> > >  #include "llvm/IR/InstVisitor.h"
> > > @@ -243,6 +244,7 @@ class DataFlowSanitizer : public ModulePass {
> > >    DFSanABIList ABIList;
> > >    DenseMap<Value *, Function *> UnwrappedFnMap;
> > >    AttributeSet ReadOnlyNoneAttrs;
> > > +  DenseMap<const Function *, DISubprogram> FunctionDIs;
> > >
> > >    Value *getShadowAddress(Value *Addr, Instruction *Pos);
> > >    bool isInstrumented(const Function *F);
> > > @@ -573,6 +575,8 @@ bool DataFlowSanitizer::runOnModule(Module &M) {
> > >    if (ABIList.isIn(M, "skip"))
> > >      return false;
> > >
> > > +  FunctionDIs = makeSubprogramMap(M);
> > > +
> > >    if (!GetArgTLSPtr) {
> > >      Type *ArgTLSTy = ArrayType::get(ShadowTy, 64);
> > >      ArgTLS = Mod->getOrInsertGlobal("__dfsan_arg_tls", ArgTLSTy);
> > > @@ -725,6 +729,12 @@ bool DataFlowSanitizer::runOnModule(Module &M) {
> > >        Value *WrappedFnCst =
> > >            ConstantExpr::getBitCast(NewF, PointerType::getUnqual(FT));
> > >        F.replaceAllUsesWith(WrappedFnCst);
> > > +
> > > +      // Patch the pointer to LLVM function in debug info descriptor.
> > > +      auto DI = FunctionDIs.find(&F);
> > > +      if (DI != FunctionDIs.end())
> > > +        DI->second.replaceFunction(&F);
> > > +
> > >        UnwrappedFnMap[WrappedFnCst] = &F;
> > >        *i = NewF;
> > >
> >
> >
> > --
> > Peter
> >

-- 
Peter



More information about the llvm-dev mailing list