[LLVMdev] DataFlowSanitizer design discussion

Fri Jun 14 13:48:06 PDT 2013

This tool isn't for stack protection; there are other tools for that.
In general the tool isn't currently focused on defending against
adversaries -- it would be trivial to write a program that accesses
shadow memory directly in order to produce incorrect results, not
to mention "tag scrubbers" which use control flow to remove tags
(see section 6 of the ASPLOS paper).

On Fri, Jun 14, 2013 at 01:23:22PM -0700, Bin Tzeng wrote:
> It is interesting. I can see some use cases with such a tool. To me,
> source-level implementation
> is not as accurate as binary translation. For instance, it is hard to check
> the taint for return addresses
> since there is no concept of return instructions on source level. The stack
> does not appear until later.
> For a security mechanism, return addresses need to be protected.
> 
> On Fri, Jun 14, 2013 at 10:43 AM, Peter Collingbourne <peter at pcc.me.uk>wrote:
> 
> > On Thu, Jun 13, 2013 at 03:13:37PM -0700, Sean Silva wrote:
> > > Could you maybe give some example use cases?
> >
> > A use case I am interested in is to take a large application and use
> > this instrumentation as a tool to help monitor how data flows from its
> > inputs (sources) to its outputs (sinks).  This has applications from
> > a privacy/security perspective in that one can audit how a sensitive
> > data item is used within a program and ensure it isn't exiting the
> > program anywhere it shouldn't be.
> >
> > An ASPLOS paper from a few years ago discusses this problem and a
> > solution based on dynamic binary instrumentation using QEMU:
> >
> > http://www.cs.ucsb.edu/~sherwood/pubs/ASPLOS-08-systemtomography.pdf
> >
> > Among other things, I hope to address a number of deficiencies of
> > the tool described by that paper, in terms of efficiency (the other
> > sanitizer tools have shown that compiler-based instrumentation can be
> > much more efficient than binary instrumentation), and also in terms
> > of accuracy (unlike the system described in that paper, we track data
> > accurately through join points using union labels).
> >
> > There are other applications outside of security.  For example,
> > one could use this instrumentation pass (or a variant of it) to tag
> > opposite-endian integers in memory, and check that no opposite-endian
> > integer is loaded or otherwise used directly without first going
> > through a conversion.
> >
> > > Also, "sanitizer" may not be the best name for this, since it doesn't
> > > really sanitize anything.
> >
> > As Reid mentioned, a goal is to build sanitizer-like tools on top of
> > this instrumentation.  Not only that, but one of the things that an
> > application can do is turn on its own sources and sinks in response
> > to the instrumentation being enabled (via the __has_feature macro).
> > So really, -fsanitize=dataflow would be the flag that turns on
> > data-flow sanitization for an application designed for it.  And should
> > the component of the compiler that allows this data-flow sanitization
> > be named any differently?
> >
> > Thanks,
> > --
> > Peter
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >

-- 
Peter