[LLVMdev] DataFlowSanitizer design discussion

Peter Collingbourne peter at pcc.me.uk
Fri Jun 14 10:43:14 PDT 2013


On Thu, Jun 13, 2013 at 03:13:37PM -0700, Sean Silva wrote:
> Could you maybe give some example use cases?

A use case I am interested in is to take a large application and use
this instrumentation as a tool to help monitor how data flows from its
inputs (sources) to its outputs (sinks).  This has applications from
a privacy/security perspective in that one can audit how a sensitive
data item is used within a program and ensure it isn't exiting the
program anywhere it shouldn't be.

An ASPLOS paper from a few years ago discusses this problem and a
solution based on dynamic binary instrumentation using QEMU:

http://www.cs.ucsb.edu/~sherwood/pubs/ASPLOS-08-systemtomography.pdf

Among other things, I hope to address a number of deficiencies of
the tool described by that paper, in terms of efficiency (the other
sanitizer tools have shown that compiler-based instrumentation can be
much more efficient than binary instrumentation), and also in terms
of accuracy (unlike the system described in that paper, we track data
accurately through join points using union labels).

There are other applications outside of security.  For example,
one could use this instrumentation pass (or a variant of it) to tag
opposite-endian integers in memory, and check that no opposite-endian
integer is loaded or otherwise used directly without first going
through a conversion.

> Also, "sanitizer" may not be the best name for this, since it doesn't
> really sanitize anything.

As Reid mentioned, a goal is to build sanitizer-like tools on top of
this instrumentation.  Not only that, but one of the things that an
application can do is turn on its own sources and sinks in response
to the instrumentation being enabled (via the __has_feature macro).
So really, -fsanitize=dataflow would be the flag that turns on
data-flow sanitization for an application designed for it.  And should
the component of the compiler that allows this data-flow sanitization
be named any differently?

Thanks,
-- 
Peter



More information about the llvm-dev mailing list