[PATCH] D103745: [dfsan] Add full fast8 support

George Balatsouras via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jun 7 11:13:52 PDT 2021


gbalats marked an inline comment as done.
gbalats added inline comments.


================
Comment at: clang/docs/DataFlowSanitizer.rst:172-178
     assert(ij_label == 3);  // Verifies all of the above
 
+    // Or, equivalently:
+    assert(dfsan_has_label(ij_label, i_label));
+    assert(dfsan_has_label(ij_label, j_label));
+    assert(!dfsan_has_label(ij_label, k_label));
+
----------------
stephan.yichao.zhao wrote:
> If we swap assert(ij_label == 3) with the 3 dfsan_has_label, the two equivalent blocks are close to each other.
I think this way is better to demonstrate the differences:
- the first block exposes dfsan internals (the integer representation) and makes explicit statements about label values
- the second block uses dfsan_has_label to abstract the internals, not exposing the integer representation (or the fact that labels are OR'd).


================
Comment at: clang/docs/DataFlowSanitizerDesign.rst:60
 
-As stated above, the tool must track a large number of taint
-labels. This poses an implementation challenge, as most multiple-label
-tainting systems assign one label per bit to shadow storage, and
-union taint labels using a bitwise or operation. This will not scale
-to clients which use hundreds or thousands of taint labels, as the
-label union operation becomes O(n) in the number of supported labels,
-and data associated with it will quickly dominate the live variable
-set, causing register spills and hampering performance.
-
-Instead, a low overhead approach is proposed which is best-case O(log\
-:sub:`2` n) during execution. The underlying assumption is that
-the required space of label unions is sparse, which is a reasonable
-assumption to make given that we are optimizing for the case where
-applications mostly copy data from one place to another, without often
-invoking the need for an actual union operation. The representation
-of a taint label is a 16-bit integer, and new labels are allocated
-sequentially from a pool. The label identifier 0 is special, and means
-that the data item is unlabelled.
-
-When a label union operation is requested at a join point (any
-arithmetic or logical operation with two or more operands, such as
-addition), the code checks whether a union is required, whether the
-same union has been requested before, and whether one union label
-subsumes the other. If so, it returns the previously allocated union
-label. If not, it allocates a new union label from the same pool used
-for new labels.
-
-Specifically, the instrumentation pass will insert code like this
-to decide the union label ``lu`` for a pair of labels ``l1``
-and ``l2``:
-
-.. code-block:: c
-
-  if (l1 == l2)
-    lu = l1;
-  else
-    lu = __dfsan_union(l1, l2);
-
-The equality comparison is outlined, to provide an early exit in
-the common cases where the program is processing unlabelled data, or
-where the two data items have the same label.  ``__dfsan_union`` is
-a runtime library function which performs all other union computation.
+We use an 8-bit unsigned integers for the representation of a
+label. The label identifier 0 is special, and means that the data item
----------------
stephan.yichao.zhao wrote:
> integer
Done.


================
Comment at: clang/docs/DataFlowSanitizerDesign.rst:65
+join point (any arithmetic or logical operation with two or more
+operands, such as addition), we can simply OR the two labels in O(1).
 
----------------
stephan.yichao.zhao wrote:
> the labels, and each OR is in O(1).
Not sure how this changes the meaning, since the two labels would need one OR instruction.


================
Comment at: clang/docs/DataFlowSanitizerDesign.rst:68
+Users are responsible for managing the 8 integer labels (i.e., keeping
+track of what labels they have used so far, pick one that is yet
+unused, etc).
----------------
stephan.yichao.zhao wrote:
> picking
Done.


================
Comment at: clang/docs/DataFlowSanitizerDesign.rst:74
 
 The following is the current memory layout for Linux/x86\_64:
 
----------------
stephan.yichao.zhao wrote:
> memory layout
Done.


================
Comment at: clang/docs/DataFlowSanitizerDesign.rst:99
 associated directly with registers.  Loads will result in a union of
-all shadow labels corresponding to bytes loaded (which most of the
-time will be short circuited by the initial comparison) and stores will
-result in a copy of the label to the shadow of all bytes stored to.
+all shadow labels corresponding to bytes loaded and stores will result
+in a copy of the label to the shadow of all bytes stored to.
----------------
stephan.yichao.zhao wrote:
> , and
Done


================
Comment at: clang/docs/DataFlowSanitizerDesign.rst:100
+all shadow labels corresponding to bytes loaded and stores will result
+in a copy of the label to the shadow of all bytes stored to.
 
----------------
stephan.yichao.zhao wrote:
> the label of a stored value
Done.


================
Comment at: compiler-rt/lib/dfsan/dfsan.cpp:209
 
-// Like __dfsan_union, but for use from the client or custom functions.  Hence
-// the equality comparison is done here before calling __dfsan_union.
+// Resolves the union of two unequal labels.
 SANITIZER_INTERFACE_ATTRIBUTE dfsan_label
----------------
stephan.yichao.zhao wrote:
> After removing legacy mode. if our code does not check l1 != l2 in IR, the comments can be updated.
Done.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103745/new/

https://reviews.llvm.org/D103745



More information about the cfe-commits mailing list