[clang] 822954b - [TySan] Add initial documentation for Type Sanitizer (#123595)

via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 28 08:58:07 PST 2025


Author: gbMattN
Date: 2025-01-28T16:58:03Z
New Revision: 822954b4a97753b0c7accc606287529518e9d425

URL: https://github.com/llvm/llvm-project/commit/822954b4a97753b0c7accc606287529518e9d425
DIFF: https://github.com/llvm/llvm-project/commit/822954b4a97753b0c7accc606287529518e9d425.diff

LOG: [TySan] Add initial documentation for Type Sanitizer (#123595)

Add some initial documentation for type sanitizer [From issue #122522]

Added: 
    clang/docs/TypeSanitizer.rst

Modified: 
    clang/docs/UsersManual.rst
    clang/docs/index.rst

Removed: 
    


################################################################################
diff  --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
new file mode 100644
index 00000000000000..8b815d8804fa8f
--- /dev/null
+++ b/clang/docs/TypeSanitizer.rst
@@ -0,0 +1,205 @@
+=============
+TypeSanitizer
+=============
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+The TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
+instrumentation module and a run-time library. C/C++ has type-based aliasing rules, and LLVM 
+can exploit these for optimizations given the TBAA metadata Clang emits. In general, a pointer 
+of a given type cannot access an object of a 
diff erent type, with only a few exceptions. 
+
+These rules aren't always apparent to users, which leads to code that violates these rules
+(e.g. for type punning). This can lead to optimization passes introducing bugs unless the 
+code is build with ``-fno-strict-aliasing``, sacrificing performance.
+
+TypeSanitizer is built to catch when these strict aliasing rules have been violated, helping 
+users find where such bugs originate in their code despite the code looking valid at first glance.
+
+As TypeSanitizer is still experimental, it can currently have a large impact on runtime speed, 
+memory use, and code size. It also has a large compile-time overhead. Work is being done to 
+reduce these impacts.
+
+The TypeSanitizer Algorithm
+===========================
+For each TBAA type-access descriptor, encoded in LLVM IR using TBAA Metadata, the instrumentation 
+pass generates descriptor tales. Thus there is a unique pointer to each type (and access descriptor).
+These tables are comdat (except for anonymous-namespace types), so the pointer values are unique 
+across the program.
+
+The descriptors refer to other descriptors to form a type aliasing tree, like how LLVM's TBAA data 
+does.
+
+The runtime uses 8 bytes of shadow memory, the size of the pointer to the type descriptor, for 
+every byte of accessed data in the program. The first byte of a type will have its shadow memory 
+be set to the pointer to its type descriptor. Aside from that, there are some other values it may be.
+
+* 0 is used to represent an unknown type
+* Negative numbers represent an interior byte: A byte inside a type that is not the first one. As an 
+  example, a value of -2 means you are in the third byte of a type.
+
+The Instrumentation first checks for an exact match between the type of the current access and the 
+type for that address in the shadow memory. This can quickly be done by checking pointer values. If 
+it matches, it checks the remaining shadow memory of the type to ensure they are the correct negative 
+numbers. If this fails, it calls the "slow path" check. If the exact match fails, we check to see if 
+the value, and the remainder of the shadow bytes, is 0. If they are, we can set the shadow memory to 
+the correct type descriptor pointer for the first byte, and the correct negative numbers for the rest 
+of the type's shadow.
+
+If the type in shadow memory is neither an exact match nor 0, we call the slower runtime check. It 
+uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to 
+alias.
+
+The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, 
+memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory 
+to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the 
+same function for the library calls.
+
+How to build
+============
+
+Build LLVM/Clang with `CMake <https://llvm.org/docs/CMake.html>`_ and enable
+the ``compiler-rt`` runtime. An example CMake configuration that will allow
+for the use/testing of TypeSanitizer:
+
+.. code-block:: console
+
+   $ cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="compiler-rt" <path to source>/llvm
+
+Usage
+=====
+
+Compile and link your program with ``-fsanitize=type`` flag. The
+TypeSanitizer run-time library should be linked to the final executable, so
+make sure to use ``clang`` (not ``ld``) for the final link step. To
+get a reasonable performance add ``-O1`` or higher.
+TypeSanitizer by default doesn't print the full stack trace in error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1`` 
+to print the full trace. To get nicer stack traces in error messages add ``-fno-omit-frame-pointer`` and 
+``-g``.  To get perfect stack traces you may need to disable inlining (just use ``-O1``) and tail call elimination 
+(``-fno-optimize-sibling-calls``).
+
+.. code-block:: console
+
+    % cat example_AliasViolation.c
+    int main(int argc, char **argv) {
+      int x = 100;
+      float *y = (float*)&x;
+      *y += 2.0f;          // Strict aliasing violation
+      return 0;
+    }
+
+    # Compile and link
+    % clang++ -g -fsanitize=type example_AliasViolation.cc
+
+The program will print an error message to ``stderr`` each time a strict aliasing violation is detected. 
+The program won't terminate, which will allow you to detect many strict aliasing violations in one 
+run.
+
+.. code-block:: console
+
+    % ./a.out
+    ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1145ff41 bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
+    READ of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
+        #0 0x5b3b1145ff40 in main example_AliasViolation.c:4:10
+
+    ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1146008a bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
+    WRITE of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
+        #0 0x5b3b11460089 in main example_AliasViolation.c:4:10
+
+Error terminology
+------------------
+
+There are some terms that may appear in TypeSanitizer errors that are derived from 
+`TBAA Metadata <https://llvm.org/docs/LangRef.html#tbaa-metadata>`. This section hopes to provide a 
+brief dictionary of these terms.
+
+* ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++ 
+  type ``char``.
+* ``type p[x]``: This signifies pointers to the type. ``x`` is the number of indirections to reach the final value.
+  As an example, a pointer to a pointer to an integer would be ``type p2 int``.
+
+TypeSanitizer is still experimental. User-facing error messages should be improved in the future to remove 
+references to LLVM IR specific terms.
+
+Sanitizer features
+==================
+
+``__has_feature(type_sanitizer)``
+------------------------------------
+
+In some cases one may need to execute 
diff erent code depending on whether
+TypeSanitizer is enabled.
+:ref:`\_\_has\_feature <langext-__has_feature-__has_extension>` can be used for
+this purpose.
+
+.. code-block:: c
+
+    #if defined(__has_feature)
+    #  if __has_feature(type_sanitizer)
+    // code that builds only under TypeSanitizer
+    #  endif
+    #endif
+
+``__attribute__((no_sanitize("type")))``
+-----------------------------------------------
+
+Some code you may not want to be instrumented by TypeSanitizer.  One may use the
+function attribute ``no_sanitize("type")`` to disable instrumenting type aliasing. 
+It is possible, depending on what happens in non-instrumented code, that instrumented code 
+emits false-positives/ false-negatives. This attribute may not be supported by other 
+compilers, so we suggest to use it together with ``__has_feature(type_sanitizer)``.
+
+``__attribute__((disable_sanitizer_instrumentation))``
+--------------------------------------------------------
+
+The ``disable_sanitizer_instrumentation`` attribute can be applied to functions
+to prevent all kinds of instrumentation. As a result, it may introduce false
+positives and incorrect stack traces. Therefore, it should be used with care,
+and only if absolutely required; for example for certain code that cannot
+tolerate any instrumentation and resulting side-effects. This attribute
+overrides ``no_sanitize("type")``.
+
+Ignorelist
+----------
+
+TypeSanitizer supports ``src`` and ``fun`` entity types in
+:doc:`SanitizerSpecialCaseList`, that can be used to suppress aliasing 
+violation reports in the specified source files or functions. Like 
+with other methods of ignoring instrumentation, this can result in false 
+positives/ false-negatives.
+
+Limitations
+-----------
+
+* TypeSanitizer uses more real memory than a native run. It uses 8 bytes of
+  shadow memory for each byte of user memory.
+* There are transformation passes which run before TypeSanitizer. If these 
+  passes optimize out an aliasing violation, TypeSanitizer cannot catch it.
+* Currently, all instrumentation is inlined. This can result in a **15x** 
+  (on average) increase in generated file size, and **3x** to **7x** increase 
+  in compile time. In some documented cases this can cause the compiler to hang.
+  There are plans to improve this in the future.
+* Codebases that use unions and struct-initialized variables can see incorrect 
+  results, as TypeSanitizer doesn't yet instrument these reliably.
+* Since Clang & LLVM's TBAA system is used to generate the checks used by the 
+  instrumentation, TypeSanitizer follows Clang & LLVM's rules for type aliasing. 
+  There may be situations where that disagrees with the standard. However this 
+  does at least mean that TypeSanitizer will catch any aliasing violations that  
+  would cause bugs when compiling with Clang & LLVM.
+* TypeSanitizer cannot currently be run alongside other sanitizers such as 
+  AddressSanitizer, ThreadSanitizer or UndefinedBehaviourSanitizer.
+
+Current Status
+--------------
+
+TypeSanitizer is brand new, and still in development. There are some known 
+issues, especially in areas where Clang's emitted TBAA data isn't extensive 
+enough for TypeSanitizer's runtime.
+
+We are actively working on enhancing the tool --- stay tuned.  Any help, 
+issues, pull requests, ideas, is more than welcome. You can find the 
+`issue tracker here.<https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aopen%20TySan%20label%3Acompiler-rt%3Atysan>`

diff  --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 260e84910c6f78..a56c9425ebb757 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2103,7 +2103,10 @@ are listed below.
 
       ``-fsanitize=undefined``: :doc:`UndefinedBehaviorSanitizer`,
       a fast and compatible undefined behavior checker.
+   -  .. _opt_fsanitize_type:
 
+      ``-fsanitize=type``: :doc:`TypeSanitizer`, a detector for strict
+      aliasing violations.
    -  ``-fsanitize=dataflow``: :doc:`DataFlowSanitizer`, a general data
       flow analysis.
    -  ``-fsanitize=cfi``: :doc:`control flow integrity <ControlFlowIntegrity>`

diff  --git a/clang/docs/index.rst b/clang/docs/index.rst
index 349378b1efa214..6c792af66a62ce 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -35,6 +35,7 @@ Using Clang as a Compiler
    UndefinedBehaviorSanitizer
    DataFlowSanitizer
    LeakSanitizer
+   TypeSanitizer
    RealtimeSanitizer
    SanitizerCoverage
    SanitizerStats


        


More information about the cfe-commits mailing list