[LLVMdev] DataFlowSanitizer design discussion

Fri Jun 14 13:23:22 PDT 2013

It is interesting. I can see some use cases with such a tool. To me,
source-level implementation
is not as accurate as binary translation. For instance, it is hard to check
the taint for return addresses
since there is no concept of return instructions on source level. The stack
does not appear until later.
For a security mechanism, return addresses need to be protected.

On Fri, Jun 14, 2013 at 10:43 AM, Peter Collingbourne <peter at pcc.me.uk>wrote:

> On Thu, Jun 13, 2013 at 03:13:37PM -0700, Sean Silva wrote:
> > Could you maybe give some example use cases?
>
> A use case I am interested in is to take a large application and use
> this instrumentation as a tool to help monitor how data flows from its
> inputs (sources) to its outputs (sinks).  This has applications from
> a privacy/security perspective in that one can audit how a sensitive
> data item is used within a program and ensure it isn't exiting the
> program anywhere it shouldn't be.
>
> An ASPLOS paper from a few years ago discusses this problem and a
> solution based on dynamic binary instrumentation using QEMU:
>
> http://www.cs.ucsb.edu/~sherwood/pubs/ASPLOS-08-systemtomography.pdf
>
> Among other things, I hope to address a number of deficiencies of
> the tool described by that paper, in terms of efficiency (the other
> sanitizer tools have shown that compiler-based instrumentation can be
> much more efficient than binary instrumentation), and also in terms
> of accuracy (unlike the system described in that paper, we track data
> accurately through join points using union labels).
>
> There are other applications outside of security.  For example,
> one could use this instrumentation pass (or a variant of it) to tag
> opposite-endian integers in memory, and check that no opposite-endian
> integer is loaded or otherwise used directly without first going
> through a conversion.
>
> > Also, "sanitizer" may not be the best name for this, since it doesn't
> > really sanitize anything.
>
> As Reid mentioned, a goal is to build sanitizer-like tools on top of
> this instrumentation.  Not only that, but one of the things that an
> application can do is turn on its own sources and sinks in response
> to the instrumentation being enabled (via the __has_feature macro).
> So really, -fsanitize=dataflow would be the flag that turns on
> data-flow sanitization for an application designed for it.  And should
> the component of the compiler that allows this data-flow sanitization
> be named any differently?
>
> Thanks,
> --
> Peter
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130614/706bd64d/attachment.html>