[www-releases] r336152 - Add 6.0.1 docs

Tom Stellard via llvm-commits llvm-commits at lists.llvm.org
Mon Jul 2 16:21:47 PDT 2018


Added: www-releases/trunk/6.0.1/tools/clang/docs/DataFlowSanitizer.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/6.0.1/tools/clang/docs/DataFlowSanitizer.html?rev=336152&view=auto
==============================================================================
--- www-releases/trunk/6.0.1/tools/clang/docs/DataFlowSanitizer.html (added)
+++ www-releases/trunk/6.0.1/tools/clang/docs/DataFlowSanitizer.html Mon Jul  2 16:21:43 2018
@@ -0,0 +1,225 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    
+    <title>DataFlowSanitizer — Clang 6 documentation</title>
+    
+    <link rel="stylesheet" href="_static/haiku.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <link rel="stylesheet" href="_static/print.css" type="text/css" />
+    
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    '',
+        VERSION:     '6',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
+    <script type="text/javascript" src="_static/theme_extras.js"></script>
+    <link rel="top" title="Clang 6 documentation" href="index.html" />
+    <link rel="next" title="DataFlowSanitizer Design Document" href="DataFlowSanitizerDesign.html" />
+    <link rel="prev" title="UndefinedBehaviorSanitizer" href="UndefinedBehaviorSanitizer.html" /> 
+  </head>
+  <body>
+      <div class="header"><h1 class="heading"><a href="index.html">
+          <span>Clang 6 documentation</span></a></h1>
+        <h2 class="heading"><span>DataFlowSanitizer</span></h2>
+      </div>
+      <div class="topnav">
+      
+        <p>
+        «  <a href="UndefinedBehaviorSanitizer.html">UndefinedBehaviorSanitizer</a>
+          ::  
+        <a class="uplink" href="index.html">Contents</a>
+          ::  
+        <a href="DataFlowSanitizerDesign.html">DataFlowSanitizer Design Document</a>  Â»
+        </p>
+
+      </div>
+      <div class="content">
+        
+        
+  <div class="section" id="dataflowsanitizer">
+<h1>DataFlowSanitizer<a class="headerlink" href="#dataflowsanitizer" title="Permalink to this headline">¶</a></h1>
+<div class="toctree-wrapper compound">
+</div>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#introduction" id="id1">Introduction</a></li>
+<li><a class="reference internal" href="#usage" id="id2">Usage</a><ul>
+<li><a class="reference internal" href="#abi-list" id="id3">ABI List</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#example" id="id4">Example</a></li>
+<li><a class="reference internal" href="#current-status" id="id5">Current status</a></li>
+<li><a class="reference internal" href="#design" id="id6">Design</a></li>
+</ul>
+</div>
+<div class="section" id="introduction">
+<h2><a class="toc-backref" href="#id1">Introduction</a><a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h2>
+<p>DataFlowSanitizer is a generalised dynamic data flow analysis.</p>
+<p>Unlike other Sanitizer tools, this tool is not designed to detect a
+specific class of bugs on its own.  Instead, it provides a generic
+dynamic data flow analysis framework to be used by clients to help
+detect application-specific issues within their own code.</p>
+</div>
+<div class="section" id="usage">
+<h2><a class="toc-backref" href="#id2">Usage</a><a class="headerlink" href="#usage" title="Permalink to this headline">¶</a></h2>
+<p>With no program changes, applying DataFlowSanitizer to a program
+will not alter its behavior.  To use DataFlowSanitizer, the program
+uses API functions to apply tags to data to cause it to be tracked, and to
+check the tag of a specific data item.  DataFlowSanitizer manages
+the propagation of tags through the program according to its data flow.</p>
+<p>The APIs are defined in the header file <tt class="docutils literal"><span class="pre">sanitizer/dfsan_interface.h</span></tt>.
+For further information about each function, please refer to the header
+file.</p>
+<div class="section" id="abi-list">
+<h3><a class="toc-backref" href="#id3">ABI List</a><a class="headerlink" href="#abi-list" title="Permalink to this headline">¶</a></h3>
+<p>DataFlowSanitizer uses a list of functions known as an ABI list to decide
+whether a call to a specific function should use the operating system’s native
+ABI or whether it should use a variant of this ABI that also propagates labels
+through function parameters and return values.  The ABI list file also controls
+how labels are propagated in the former case.  DataFlowSanitizer comes with a
+default ABI list which is intended to eventually cover the glibc library on
+Linux but it may become necessary for users to extend the ABI list in cases
+where a particular library or function cannot be instrumented (e.g. because
+it is implemented in assembly or another language which DataFlowSanitizer does
+not support) or a function is called from a library or function which cannot
+be instrumented.</p>
+<p>DataFlowSanitizer’s ABI list file is a <a class="reference internal" href="SanitizerSpecialCaseList.html"><em>Sanitizer special case list</em></a>.
+The pass treats every function in the <tt class="docutils literal"><span class="pre">uninstrumented</span></tt> category in the
+ABI list file as conforming to the native ABI.  Unless the ABI list contains
+additional categories for those functions, a call to one of those functions
+will produce a warning message, as the labelling behavior of the function
+is unknown.  The other supported categories are <tt class="docutils literal"><span class="pre">discard</span></tt>, <tt class="docutils literal"><span class="pre">functional</span></tt>
+and <tt class="docutils literal"><span class="pre">custom</span></tt>.</p>
+<ul class="simple">
+<li><tt class="docutils literal"><span class="pre">discard</span></tt> – To the extent that this function writes to (user-accessible)
+memory, it also updates labels in shadow memory (this condition is trivially
+satisfied for functions which do not write to user-accessible memory).  Its
+return value is unlabelled.</li>
+<li><tt class="docutils literal"><span class="pre">functional</span></tt> – Like <tt class="docutils literal"><span class="pre">discard</span></tt>, except that the label of its return value
+is the union of the label of its arguments.</li>
+<li><tt class="docutils literal"><span class="pre">custom</span></tt> – Instead of calling the function, a custom wrapper <tt class="docutils literal"><span class="pre">__dfsw_F</span></tt>
+is called, where <tt class="docutils literal"><span class="pre">F</span></tt> is the name of the function.  This function may wrap
+the original function or provide its own implementation.  This category is
+generally used for uninstrumentable functions which write to user-accessible
+memory or which have more complex label propagation behavior.  The signature
+of <tt class="docutils literal"><span class="pre">__dfsw_F</span></tt> is based on that of <tt class="docutils literal"><span class="pre">F</span></tt> with each argument having a
+label of type <tt class="docutils literal"><span class="pre">dfsan_label</span></tt> appended to the argument list.  If <tt class="docutils literal"><span class="pre">F</span></tt>
+is of non-void return type a final argument of type <tt class="docutils literal"><span class="pre">dfsan_label</span> <span class="pre">*</span></tt>
+is appended to which the custom function can store the label for the
+return value.  For example:</li>
+</ul>
+<div class="highlight-c++"><div class="highlight"><pre><span class="kt">void</span> <span class="n">f</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">);</span>
+<span class="kt">void</span> <span class="n">__dfsw_f</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="n">dfsan_label</span> <span class="n">x_label</span><span class="p">);</span>
+
+<span class="kt">void</span> <span class="o">*</span><span class="n">memcpy</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">dest</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">src</span><span class="p">,</span> <span class="n">size_t</span> <span class="n">n</span><span class="p">);</span>
+<span class="kt">void</span> <span class="o">*</span><span class="n">__dfsw_memcpy</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">dest</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">src</span><span class="p">,</span> <span class="n">size_t</span> <span class="n">n</span><span class="p">,</span>
+                    <span class="n">dfsan_label</span> <span class="n">dest_label</span><span class="p">,</span> <span class="n">dfsan_label</span> <span class="n">src_label</span><span class="p">,</span>
+                    <span class="n">dfsan_label</span> <span class="n">n_label</span><span class="p">,</span> <span class="n">dfsan_label</span> <span class="o">*</span><span class="n">ret_label</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>If a function defined in the translation unit being compiled belongs to the
+<tt class="docutils literal"><span class="pre">uninstrumented</span></tt> category, it will be compiled so as to conform to the
+native ABI.  Its arguments will be assumed to be unlabelled, but it will
+propagate labels in shadow memory.</p>
+<p>For example:</p>
+<div class="highlight-none"><div class="highlight"><pre># main is called by the C runtime using the native ABI.
+fun:main=uninstrumented
+fun:main=discard
+
+# malloc only writes to its internal data structures, not user-accessible memory.
+fun:malloc=uninstrumented
+fun:malloc=discard
+
+# tolower is a pure function.
+fun:tolower=uninstrumented
+fun:tolower=functional
+
+# memcpy needs to copy the shadow from the source to the destination region.
+# This is done in a custom function.
+fun:memcpy=uninstrumented
+fun:memcpy=custom
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="example">
+<h2><a class="toc-backref" href="#id4">Example</a><a class="headerlink" href="#example" title="Permalink to this headline">¶</a></h2>
+<p>The following program demonstrates label propagation by checking that
+the correct labels are propagated.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span class="cp">#include <sanitizer/dfsan_interface.h></span>
+<span class="cp">#include <assert.h></span>
+
+<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
+  <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
+  <span class="n">dfsan_label</span> <span class="n">i_label</span> <span class="o">=</span> <span class="n">dfsan_create_label</span><span class="p">(</span><span class="s">"i"</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
+  <span class="n">dfsan_set_label</span><span class="p">(</span><span class="n">i_label</span><span class="p">,</span> <span class="o">&</span><span class="n">i</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">i</span><span class="p">));</span>
+
+  <span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
+  <span class="n">dfsan_label</span> <span class="n">j_label</span> <span class="o">=</span> <span class="n">dfsan_create_label</span><span class="p">(</span><span class="s">"j"</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
+  <span class="n">dfsan_set_label</span><span class="p">(</span><span class="n">j_label</span><span class="p">,</span> <span class="o">&</span><span class="n">j</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">j</span><span class="p">));</span>
+
+  <span class="kt">int</span> <span class="n">k</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
+  <span class="n">dfsan_label</span> <span class="n">k_label</span> <span class="o">=</span> <span class="n">dfsan_create_label</span><span class="p">(</span><span class="s">"k"</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
+  <span class="n">dfsan_set_label</span><span class="p">(</span><span class="n">k_label</span><span class="p">,</span> <span class="o">&</span><span class="n">k</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">k</span><span class="p">));</span>
+
+  <span class="n">dfsan_label</span> <span class="n">ij_label</span> <span class="o">=</span> <span class="n">dfsan_get_label</span><span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="n">j</span><span class="p">);</span>
+  <span class="n">assert</span><span class="p">(</span><span class="n">dfsan_has_label</span><span class="p">(</span><span class="n">ij_label</span><span class="p">,</span> <span class="n">i_label</span><span class="p">));</span>
+  <span class="n">assert</span><span class="p">(</span><span class="n">dfsan_has_label</span><span class="p">(</span><span class="n">ij_label</span><span class="p">,</span> <span class="n">j_label</span><span class="p">));</span>
+  <span class="n">assert</span><span class="p">(</span><span class="o">!</span><span class="n">dfsan_has_label</span><span class="p">(</span><span class="n">ij_label</span><span class="p">,</span> <span class="n">k_label</span><span class="p">));</span>
+
+  <span class="n">dfsan_label</span> <span class="n">ijk_label</span> <span class="o">=</span> <span class="n">dfsan_get_label</span><span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="n">j</span> <span class="o">+</span> <span class="n">k</span><span class="p">);</span>
+  <span class="n">assert</span><span class="p">(</span><span class="n">dfsan_has_label</span><span class="p">(</span><span class="n">ijk_label</span><span class="p">,</span> <span class="n">i_label</span><span class="p">));</span>
+  <span class="n">assert</span><span class="p">(</span><span class="n">dfsan_has_label</span><span class="p">(</span><span class="n">ijk_label</span><span class="p">,</span> <span class="n">j_label</span><span class="p">));</span>
+  <span class="n">assert</span><span class="p">(</span><span class="n">dfsan_has_label</span><span class="p">(</span><span class="n">ijk_label</span><span class="p">,</span> <span class="n">k_label</span><span class="p">));</span>
+
+  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="current-status">
+<h2><a class="toc-backref" href="#id5">Current status</a><a class="headerlink" href="#current-status" title="Permalink to this headline">¶</a></h2>
+<p>DataFlowSanitizer is a work in progress, currently under development for
+x86_64 Linux.</p>
+</div>
+<div class="section" id="design">
+<h2><a class="toc-backref" href="#id6">Design</a><a class="headerlink" href="#design" title="Permalink to this headline">¶</a></h2>
+<p>Please refer to the <a class="reference internal" href="DataFlowSanitizerDesign.html"><em>design document</em></a>.</p>
+</div>
+</div>
+
+
+      </div>
+      <div class="bottomnav">
+      
+        <p>
+        «  <a href="UndefinedBehaviorSanitizer.html">UndefinedBehaviorSanitizer</a>
+          ::  
+        <a class="uplink" href="index.html">Contents</a>
+          ::  
+        <a href="DataFlowSanitizerDesign.html">DataFlowSanitizer Design Document</a>  Â»
+        </p>
+
+      </div>
+
+    <div class="footer">
+        © Copyright 2007-2018, The Clang Team.
+      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.3.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/6.0.1/tools/clang/docs/DataFlowSanitizerDesign.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/6.0.1/tools/clang/docs/DataFlowSanitizerDesign.html?rev=336152&view=auto
==============================================================================
--- www-releases/trunk/6.0.1/tools/clang/docs/DataFlowSanitizerDesign.html (added)
+++ www-releases/trunk/6.0.1/tools/clang/docs/DataFlowSanitizerDesign.html Mon Jul  2 16:21:43 2018
@@ -0,0 +1,295 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    
+    <title>DataFlowSanitizer Design Document — Clang 6 documentation</title>
+    
+    <link rel="stylesheet" href="_static/haiku.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <link rel="stylesheet" href="_static/print.css" type="text/css" />
+    
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    '',
+        VERSION:     '6',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
+    <script type="text/javascript" src="_static/theme_extras.js"></script>
+    <link rel="top" title="Clang 6 documentation" href="index.html" />
+    <link rel="up" title="DataFlowSanitizer" href="DataFlowSanitizer.html" />
+    <link rel="next" title="LeakSanitizer" href="LeakSanitizer.html" />
+    <link rel="prev" title="DataFlowSanitizer" href="DataFlowSanitizer.html" /> 
+  </head>
+  <body>
+      <div class="header"><h1 class="heading"><a href="index.html">
+          <span>Clang 6 documentation</span></a></h1>
+        <h2 class="heading"><span>DataFlowSanitizer Design Document</span></h2>
+      </div>
+      <div class="topnav">
+      
+        <p>
+        «  <a href="DataFlowSanitizer.html">DataFlowSanitizer</a>
+          ::  
+        <a class="uplink" href="index.html">Contents</a>
+          ::  
+        <a href="LeakSanitizer.html">LeakSanitizer</a>  Â»
+        </p>
+
+      </div>
+      <div class="content">
+        
+        
+  <div class="section" id="dataflowsanitizer-design-document">
+<h1>DataFlowSanitizer Design Document<a class="headerlink" href="#dataflowsanitizer-design-document" title="Permalink to this headline">¶</a></h1>
+<p>This document sets out the design for DataFlowSanitizer, a general
+dynamic data flow analysis.  Unlike other Sanitizer tools, this tool is
+not designed to detect a specific class of bugs on its own. Instead,
+it provides a generic dynamic data flow analysis framework to be used
+by clients to help detect application-specific issues within their
+own code.</p>
+<p>DataFlowSanitizer is a program instrumentation which can associate
+a number of taint labels with any data stored in any memory region
+accessible by the program. The analysis is dynamic, which means that
+it operates on a running program, and tracks how the labels propagate
+through that program. The tool shall support a large (>100) number
+of labels, such that programs which operate on large numbers of data
+items may be analysed with each data item being tracked separately.</p>
+<div class="section" id="use-cases">
+<h2>Use Cases<a class="headerlink" href="#use-cases" title="Permalink to this headline">¶</a></h2>
+<p>This instrumentation can be used as a tool to help monitor how data
+flows from a program’s inputs (sources) to its outputs (sinks).
+This has applications from a privacy/security perspective in that
+one can audit how a sensitive data item is used within a program and
+ensure it isn’t exiting the program anywhere it shouldn’t be.</p>
+</div>
+<div class="section" id="interface">
+<h2>Interface<a class="headerlink" href="#interface" title="Permalink to this headline">¶</a></h2>
+<p>A number of functions are provided which will create taint labels,
+attach labels to memory regions and extract the set of labels
+associated with a specific memory region. These functions are declared
+in the header file <tt class="docutils literal"><span class="pre">sanitizer/dfsan_interface.h</span></tt>.</p>
+<div class="highlight-c"><div class="highlight"><pre><span class="c1">/// Creates and returns a base label with the given description and user data.</span>
+<span class="n">dfsan_label</span> <span class="n">dfsan_create_label</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">desc</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">userdata</span><span class="p">);</span>
+
+<span class="c1">/// Sets the label for each address in [addr,addr+size) to \c label.</span>
+<span class="kt">void</span> <span class="n">dfsan_set_label</span><span class="p">(</span><span class="n">dfsan_label</span> <span class="n">label</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">addr</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">size</span><span class="p">);</span>
+
+<span class="c1">/// Sets the label for each address in [addr,addr+size) to the union of the</span>
+<span class="c1">/// current label for that address and \c label.</span>
+<span class="kt">void</span> <span class="n">dfsan_add_label</span><span class="p">(</span><span class="n">dfsan_label</span> <span class="n">label</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">addr</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">size</span><span class="p">);</span>
+
+<span class="c1">/// Retrieves the label associated with the given data.</span>
+<span class="c1">///</span>
+<span class="c1">/// The type of 'data' is arbitrary.  The function accepts a value of any type,</span>
+<span class="c1">/// which can be truncated or extended (implicitly or explicitly) as necessary.</span>
+<span class="c1">/// The truncation/extension operations will preserve the label of the original</span>
+<span class="c1">/// value.</span>
+<span class="n">dfsan_label</span> <span class="n">dfsan_get_label</span><span class="p">(</span><span class="kt">long</span> <span class="n">data</span><span class="p">);</span>
+
+<span class="c1">/// Retrieves a pointer to the dfsan_label_info struct for the given label.</span>
+<span class="k">const</span> <span class="k">struct</span> <span class="n">dfsan_label_info</span> <span class="o">*</span><span class="n">dfsan_get_label_info</span><span class="p">(</span><span class="n">dfsan_label</span> <span class="n">label</span><span class="p">);</span>
+
+<span class="c1">/// Returns whether the given label label contains the label elem.</span>
+<span class="kt">int</span> <span class="n">dfsan_has_label</span><span class="p">(</span><span class="n">dfsan_label</span> <span class="n">label</span><span class="p">,</span> <span class="n">dfsan_label</span> <span class="n">elem</span><span class="p">);</span>
+
+<span class="c1">/// If the given label label contains a label with the description desc, returns</span>
+<span class="c1">/// that label, else returns 0.</span>
+<span class="n">dfsan_label</span> <span class="n">dfsan_has_label_with_desc</span><span class="p">(</span><span class="n">dfsan_label</span> <span class="n">label</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">desc</span><span class="p">);</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="taint-label-representation">
+<h2>Taint label representation<a class="headerlink" href="#taint-label-representation" title="Permalink to this headline">¶</a></h2>
+<p>As stated above, the tool must track a large number of taint
+labels. This poses an implementation challenge, as most multiple-label
+tainting systems assign one label per bit to shadow storage, and
+union taint labels using a bitwise or operation. This will not scale
+to clients which use hundreds or thousands of taint labels, as the
+label union operation becomes O(n) in the number of supported labels,
+and data associated with it will quickly dominate the live variable
+set, causing register spills and hampering performance.</p>
+<p>Instead, a low overhead approach is proposed which is best-case O(log<sub>2</sub> n) during execution. The underlying assumption is that
+the required space of label unions is sparse, which is a reasonable
+assumption to make given that we are optimizing for the case where
+applications mostly copy data from one place to another, without often
+invoking the need for an actual union operation. The representation
+of a taint label is a 16-bit integer, and new labels are allocated
+sequentially from a pool. The label identifier 0 is special, and means
+that the data item is unlabelled.</p>
+<p>When a label union operation is requested at a join point (any
+arithmetic or logical operation with two or more operands, such as
+addition), the code checks whether a union is required, whether the
+same union has been requested before, and whether one union label
+subsumes the other. If so, it returns the previously allocated union
+label. If not, it allocates a new union label from the same pool used
+for new labels.</p>
+<p>Specifically, the instrumentation pass will insert code like this
+to decide the union label <tt class="docutils literal"><span class="pre">lu</span></tt> for a pair of labels <tt class="docutils literal"><span class="pre">l1</span></tt>
+and <tt class="docutils literal"><span class="pre">l2</span></tt>:</p>
+<div class="highlight-c"><div class="highlight"><pre><span class="k">if</span> <span class="p">(</span><span class="n">l1</span> <span class="o">==</span> <span class="n">l2</span><span class="p">)</span>
+  <span class="n">lu</span> <span class="o">=</span> <span class="n">l1</span><span class="p">;</span>
+<span class="k">else</span>
+  <span class="n">lu</span> <span class="o">=</span> <span class="n">__dfsan_union</span><span class="p">(</span><span class="n">l1</span><span class="p">,</span> <span class="n">l2</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The equality comparison is outlined, to provide an early exit in
+the common cases where the program is processing unlabelled data, or
+where the two data items have the same label.  <tt class="docutils literal"><span class="pre">__dfsan_union</span></tt> is
+a runtime library function which performs all other union computation.</p>
+<p>Further optimizations are possible, for example if <tt class="docutils literal"><span class="pre">l1</span></tt> is known
+at compile time to be zero (e.g. it is derived from a constant),
+<tt class="docutils literal"><span class="pre">l2</span></tt> can be used for <tt class="docutils literal"><span class="pre">lu</span></tt>, and vice versa.</p>
+</div>
+<div class="section" id="memory-layout-and-label-management">
+<h2>Memory layout and label management<a class="headerlink" href="#memory-layout-and-label-management" title="Permalink to this headline">¶</a></h2>
+<p>The following is the current memory layout for Linux/x86_64:</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="30%" />
+<col width="30%" />
+<col width="40%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Start</th>
+<th class="head">End</th>
+<th class="head">Use</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>0x700000008000</td>
+<td>0x800000000000</td>
+<td>application memory</td>
+</tr>
+<tr class="row-odd"><td>0x200200000000</td>
+<td>0x700000008000</td>
+<td>unused</td>
+</tr>
+<tr class="row-even"><td>0x200000000000</td>
+<td>0x200200000000</td>
+<td>union table</td>
+</tr>
+<tr class="row-odd"><td>0x000000010000</td>
+<td>0x200000000000</td>
+<td>shadow memory</td>
+</tr>
+<tr class="row-even"><td>0x000000000000</td>
+<td>0x000000010000</td>
+<td>reserved by kernel</td>
+</tr>
+</tbody>
+</table>
+<p>Each byte of application memory corresponds to two bytes of shadow
+memory, which are used to store its taint label. As for LLVM SSA
+registers, we have not found it necessary to associate a label with
+each byte or bit of data, as some other tools do. Instead, labels are
+associated directly with registers.  Loads will result in a union of
+all shadow labels corresponding to bytes loaded (which most of the
+time will be short circuited by the initial comparison) and stores will
+result in a copy of the label to the shadow of all bytes stored to.</p>
+</div>
+<div class="section" id="propagating-labels-through-arguments">
+<h2>Propagating labels through arguments<a class="headerlink" href="#propagating-labels-through-arguments" title="Permalink to this headline">¶</a></h2>
+<p>In order to propagate labels through function arguments and return values,
+DataFlowSanitizer changes the ABI of each function in the translation unit.
+There are currently two supported ABIs:</p>
+<ul class="simple">
+<li>Args – Argument and return value labels are passed through additional
+arguments and by modifying the return type.</li>
+<li>TLS – Argument and return value labels are passed through TLS variables
+<tt class="docutils literal"><span class="pre">__dfsan_arg_tls</span></tt> and <tt class="docutils literal"><span class="pre">__dfsan_retval_tls</span></tt>.</li>
+</ul>
+<p>The main advantage of the TLS ABI is that it is more tolerant of ABI mismatches
+(TLS storage is not shared with any other form of storage, whereas extra
+arguments may be stored in registers which under the native ABI are not used
+for parameter passing and thus could contain arbitrary values).  On the other
+hand the args ABI is more efficient and allows ABI mismatches to be more easily
+identified by checking for nonzero labels in nominally unlabelled programs.</p>
+</div>
+<div class="section" id="implementing-the-abi-list">
+<h2>Implementing the ABI list<a class="headerlink" href="#implementing-the-abi-list" title="Permalink to this headline">¶</a></h2>
+<p>The <a class="reference external" href="DataFlowSanitizer.html#abi-list">ABI list</a> provides a list of functions
+which conform to the native ABI, each of which is callable from an instrumented
+program.  This is implemented by replacing each reference to a native ABI
+function with a reference to a function which uses the instrumented ABI.
+Such functions are automatically-generated wrappers for the native functions.
+For example, given the ABI list example provided in the user manual, the
+following wrappers will be generated under the args ABI:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span class="k">define</span> <span class="k">linkonce_odr</span> <span class="p">{</span> <span class="k">i8</span><span class="p">*,</span> <span class="k">i16</span> <span class="p">}</span> <span class="vg">@"dfsw$malloc"</span><span class="p">(</span><span class="k">i64</span> <span class="nv-Anonymous">%0</span><span class="p">,</span> <span class="k">i16</span> <span class="nv-Anonymous">%1</span><span class="p">)</span> <span class="p">{</span>
+<span class="nl">entry:</span>
+  <span class="nv-Anonymous">%2</span> <span class="p">=</span> <span class="k">call</span> <span class="k">i8</span><span class="p">*</span> <span class="vg">@malloc</span><span class="p">(</span><span class="k">i64</span> <span class="nv-Anonymous">%0</span><span class="p">)</span>
+  <span class="nv-Anonymous">%3</span> <span class="p">=</span> <span class="k">insertvalue</span> <span class="p">{</span> <span class="k">i8</span><span class="p">*,</span> <span class="k">i16</span> <span class="p">}</span> <span class="k">undef</span><span class="p">,</span> <span class="k">i8</span><span class="p">*</span> <span class="nv-Anonymous">%2</span><span class="p">,</span> <span class="m">0</span>
+  <span class="nv-Anonymous">%4</span> <span class="p">=</span> <span class="k">insertvalue</span> <span class="p">{</span> <span class="k">i8</span><span class="p">*,</span> <span class="k">i16</span> <span class="p">}</span> <span class="nv-Anonymous">%3</span><span class="p">,</span> <span class="k">i16</span> <span class="m">0</span><span class="p">,</span> <span class="m">1</span>
+  <span class="k">ret</span> <span class="p">{</span> <span class="k">i8</span><span class="p">*,</span> <span class="k">i16</span> <span class="p">}</span> <span class="nv-Anonymous">%4</span>
+<span class="p">}</span>
+
+<span class="k">define</span> <span class="k">linkonce_odr</span> <span class="p">{</span> <span class="k">i32</span><span class="p">,</span> <span class="k">i16</span> <span class="p">}</span> <span class="vg">@"dfsw$tolower"</span><span class="p">(</span><span class="k">i32</span> <span class="nv-Anonymous">%0</span><span class="p">,</span> <span class="k">i16</span> <span class="nv-Anonymous">%1</span><span class="p">)</span> <span class="p">{</span>
+<span class="nl">entry:</span>
+  <span class="nv-Anonymous">%2</span> <span class="p">=</span> <span class="k">call</span> <span class="k">i32</span> <span class="vg">@tolower</span><span class="p">(</span><span class="k">i32</span> <span class="nv-Anonymous">%0</span><span class="p">)</span>
+  <span class="nv-Anonymous">%3</span> <span class="p">=</span> <span class="k">insertvalue</span> <span class="p">{</span> <span class="k">i32</span><span class="p">,</span> <span class="k">i16</span> <span class="p">}</span> <span class="k">undef</span><span class="p">,</span> <span class="k">i32</span> <span class="nv-Anonymous">%2</span><span class="p">,</span> <span class="m">0</span>
+  <span class="nv-Anonymous">%4</span> <span class="p">=</span> <span class="k">insertvalue</span> <span class="p">{</span> <span class="k">i32</span><span class="p">,</span> <span class="k">i16</span> <span class="p">}</span> <span class="nv-Anonymous">%3</span><span class="p">,</span> <span class="k">i16</span> <span class="nv-Anonymous">%1</span><span class="p">,</span> <span class="m">1</span>
+  <span class="k">ret</span> <span class="p">{</span> <span class="k">i32</span><span class="p">,</span> <span class="k">i16</span> <span class="p">}</span> <span class="nv-Anonymous">%4</span>
+<span class="p">}</span>
+
+<span class="k">define</span> <span class="k">linkonce_odr</span> <span class="p">{</span> <span class="k">i8</span><span class="p">*,</span> <span class="k">i16</span> <span class="p">}</span> <span class="vg">@"dfsw$memcpy"</span><span class="p">(</span><span class="k">i8</span><span class="p">*</span> <span class="nv-Anonymous">%0</span><span class="p">,</span> <span class="k">i8</span><span class="p">*</span> <span class="nv-Anonymous">%1</span><span class="p">,</span> <span class="k">i64</span> <span class="nv-Anonymous">%2</span><span class="p">,</span> <span class="k">i16</span> <span class="nv-Anonymous">%3</span><span class="p">,</span> <span class="k">i16</span> <span class="nv-Anonymous">%4</span><span class="p">,</span> <span class="k">i16</span> <span class="nv-Anonymous">%5</span><span class="p">)</span> <span class="p">{</span>
+<span class="nl">entry:</span>
+  <span class="nv">%labelreturn</span> <span class="p">=</span> <span class="k">alloca</span> <span class="k">i16</span>
+  <span class="nv-Anonymous">%6</span> <span class="p">=</span> <span class="k">call</span> <span class="k">i8</span><span class="p">*</span> <span class="vg">@__dfsw_memcpy</span><span class="p">(</span><span class="k">i8</span><span class="p">*</span> <span class="nv-Anonymous">%0</span><span class="p">,</span> <span class="k">i8</span><span class="p">*</span> <span class="nv-Anonymous">%1</span><span class="p">,</span> <span class="k">i64</span> <span class="nv-Anonymous">%2</span><span class="p">,</span> <span class="k">i16</span> <span class="nv-Anonymous">%3</span><span class="p">,</span> <span class="k">i16</span> <span class="nv-Anonymous">%4</span><span class="p">,</span> <span class="k">i16</span> <span class="nv-Anonymous">%5</span><span class="p">,</span> <span class="k">i16</span><span class="p">*</span> <span class="nv">%labelreturn</span><span class="p">)</span>
+  <span class="nv-Anonymous">%7</span> <span class="p">=</span> <span class="k">load</span> <span class="k">i16</span><span class="p">*</span> <span class="nv">%labelreturn</span>
+  <span class="nv-Anonymous">%8</span> <span class="p">=</span> <span class="k">insertvalue</span> <span class="p">{</span> <span class="k">i8</span><span class="p">*,</span> <span class="k">i16</span> <span class="p">}</span> <span class="k">undef</span><span class="p">,</span> <span class="k">i8</span><span class="p">*</span> <span class="nv-Anonymous">%6</span><span class="p">,</span> <span class="m">0</span>
+  <span class="nv-Anonymous">%9</span> <span class="p">=</span> <span class="k">insertvalue</span> <span class="p">{</span> <span class="k">i8</span><span class="p">*,</span> <span class="k">i16</span> <span class="p">}</span> <span class="nv-Anonymous">%8</span><span class="p">,</span> <span class="k">i16</span> <span class="nv-Anonymous">%7</span><span class="p">,</span> <span class="m">1</span>
+  <span class="k">ret</span> <span class="p">{</span> <span class="k">i8</span><span class="p">*,</span> <span class="k">i16</span> <span class="p">}</span> <span class="nv-Anonymous">%9</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>As an optimization, direct calls to native ABI functions will call the
+native ABI function directly and the pass will compute the appropriate label
+internally.  This has the advantage of reducing the number of union operations
+required when the return value label is known to be zero (i.e. <tt class="docutils literal"><span class="pre">discard</span></tt>
+functions, or <tt class="docutils literal"><span class="pre">functional</span></tt> functions with known unlabelled arguments).</p>
+</div>
+<div class="section" id="checking-abi-consistency">
+<h2>Checking ABI Consistency<a class="headerlink" href="#checking-abi-consistency" title="Permalink to this headline">¶</a></h2>
+<p>DFSan changes the ABI of each function in the module.  This makes it possible
+for a function with the native ABI to be called with the instrumented ABI,
+or vice versa, thus possibly invoking undefined behavior.  A simple way
+of statically detecting instances of this problem is to prepend the prefix
+“dfs$” to the name of each instrumented-ABI function.</p>
+<p>This will not catch every such problem; in particular function pointers passed
+across the instrumented-native barrier cannot be used on the other side.
+These problems could potentially be caught dynamically.</p>
+</div>
+</div>
+
+
+      </div>
+      <div class="bottomnav">
+      
+        <p>
+        «  <a href="DataFlowSanitizer.html">DataFlowSanitizer</a>
+          ::  
+        <a class="uplink" href="index.html">Contents</a>
+          ::  
+        <a href="LeakSanitizer.html">LeakSanitizer</a>  Â»
+        </p>
+
+      </div>
+
+    <div class="footer">
+        © Copyright 2007-2018, The Clang Team.
+      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.3.
+    </div>
+  </body>
+</html>
\ No newline at end of file




More information about the llvm-commits mailing list