<div dir="ltr"><div>It is interesting. I can see some use cases with such a tool. To me, source-level implementation<br></div><div>is not as accurate as binary translation. For instance, it is hard to check the taint for return addresses<br>

since there is no concept of return instructions on source level. The stack does not appear until later.<br></div><div><div><div class="gmail_extra">For a security mechanism, return addresses need to be protected.<br></div>

<div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 14, 2013 at 10:43 AM, Peter Collingbourne <span dir="ltr"><<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On Thu, Jun 13, 2013 at 03:13:37PM -0700, Sean Silva wrote:<br>

> Could you maybe give some example use cases?<br>

<br>

</div>A use case I am interested in is to take a large application and use<br>

this instrumentation as a tool to help monitor how data flows from its<br>

inputs (sources) to its outputs (sinks).  This has applications from<br>

a privacy/security perspective in that one can audit how a sensitive<br>

data item is used within a program and ensure it isn't exiting the<br>

program anywhere it shouldn't be.<br>

<br>

An ASPLOS paper from a few years ago discusses this problem and a<br>

solution based on dynamic binary instrumentation using QEMU:<br>

<br>

<a href="http://www.cs.ucsb.edu/~sherwood/pubs/ASPLOS-08-systemtomography.pdf" target="_blank">http://www.cs.ucsb.edu/~sherwood/pubs/ASPLOS-08-systemtomography.pdf</a><br>

<br>

Among other things, I hope to address a number of deficiencies of<br>

the tool described by that paper, in terms of efficiency (the other<br>

sanitizer tools have shown that compiler-based instrumentation can be<br>

much more efficient than binary instrumentation), and also in terms<br>

of accuracy (unlike the system described in that paper, we track data<br>

accurately through join points using union labels).<br>

<br>

There are other applications outside of security.  For example,<br>

one could use this instrumentation pass (or a variant of it) to tag<br>

opposite-endian integers in memory, and check that no opposite-endian<br>

integer is loaded or otherwise used directly without first going<br>

through a conversion.<br>

<div class="im"><br>

> Also, "sanitizer" may not be the best name for this, since it doesn't<br>

> really sanitize anything.<br>

<br>

</div>As Reid mentioned, a goal is to build sanitizer-like tools on top of<br>

this instrumentation.  Not only that, but one of the things that an<br>

application can do is turn on its own sources and sinks in response<br>

to the instrumentation being enabled (via the __has_feature macro).<br>

So really, -fsanitize=dataflow would be the flag that turns on<br>

data-flow sanitization for an application designed for it.  And should<br>

the component of the compiler that allows this data-flow sanitization<br>

be named any differently?<br>

<br>

Thanks,<br>

<div class="HOEnZb"><div class="h5">--<br>

Peter<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</div></div></blockquote></div><br></div></div></div></div>