[clang] [TySan] Add initial documentation for Type Sanitizer (PR #123595)

via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 28 08:18:13 PST 2025


https://github.com/gbMattN updated https://github.com/llvm/llvm-project/pull/123595

>From 807c2c8be0517cbb1b9db890f48baeb6f226ba2f Mon Sep 17 00:00:00 2001
From: gbMattN <matthew.nagy at sony.com>
Date: Mon, 20 Jan 2025 11:02:06 +0000
Subject: [PATCH 01/12] [TySan] Add initial documentation

---
 clang/docs/TypeSanitizer.rst | 152 +++++++++++++++++++++++++++++++++++
 1 file changed, 152 insertions(+)
 create mode 100644 clang/docs/TypeSanitizer.rst

diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
new file mode 100644
index 00000000000000..6b320f3bb1773d
--- /dev/null
+++ b/clang/docs/TypeSanitizer.rst
@@ -0,0 +1,152 @@
+================
+TypeSanitizer
+================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
+instrumentation module and a run-time library. The tool detects violations such as the use 
+of an illegally cast pointer, or misuse of a union.
+
+The violations TypeSanitizer catches may cause the compiler to emit incorrect code.
+
+Typical slowdown introduced by TypeSanitizer is about **4x** [[CHECK THIS]]. Typical memory overhead introduced by TypeSanitizer is about **9x**. 
+
+How to build
+============
+
+Build LLVM/Clang with `CMake <https://llvm.org/docs/CMake.html>`_ and enable
+the ``compiler-rt`` runtime. An example CMake configuration that will allow
+for the use/testing of TypeSanitizer:
+
+.. code-block:: console
+
+   $ cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="compiler-rt" <path to source>/llvm
+
+Usage
+=====
+
+Compile and link your program with ``-fsanitize=type`` flag.  The
+TypeSanitizer run-time library should be linked to the final executable, so
+make sure to use ``clang`` (not ``ld``) for the final link step. To
+get a reasonable performance add ``-O1`` or higher 
+(`This may currently lead to false-negatives <https://github.com/llvm/llvm-project/issues/120855>`). 
+TypeSanitizer by default doesn't print the full stack trace on error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1`` 
+to print the full trace. To get nicer stack traces in error messages add ``-fno-omit-frame-pointer`` and 
+``-g``.  To get perfect stack traces you may need to disable inlining (just use ``-O1``) and tail call elimination 
+(``-fno-optimize-sibling-calls``).
+
+.. code-block:: console
+
+    % cat example_AliasViolation.c
+    int main(int argc, char **argv) {
+      int x = 100;
+      float *y = (float*)&x;
+      *y += 2.0f;          // Strict aliasing violation
+      return 0;
+    }
+
+    # Compile and link
+    % clang++ -g -fsanitize=type example_AliasViolation.cc
+
+If a strict aliasing violation is detected, the program will print an error message to stderr. 
+The program won't terminate, which will allow you to detect many strict aliasing violations in one 
+run.
+
+.. code-block:: console
+    % ./a.out
+    ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1145ff41 bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
+    READ of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
+        #0 0x5b3b1145ff40 in main example_AliasViolation.c:4:10
+
+    ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1146008a bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
+    WRITE of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
+        #0 0x5b3b11460089 in main example_AliasViolation.c:4:10
+
+Error terminology
+------------------
+
+There are some terms that may appear in TypeSanitizer errors that are derived from TBAA Metadata. This 
+section hopes to provide a brief dictionary of these terms.
+
+* ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++ 
+  type ``char``.
+* ``type p[x]``: Sometimes a program could generate distinct TBAA metadata that resolve to the same name. 
+  To make them unique, they have the character 'p' and a number prepended to their name.
+
+These terms are a result of non-user-facing processes, and not always self-explanatory. There is some 
+interest in changing TypeSanitizer in the future to translate these terms before printing them to users.
+
+Sanitizer features
+==================
+
+``__has_feature(type_sanitizer)``
+------------------------------------
+
+In some cases one may need to execute different code depending on whether
+TypeSanitizer is enabled.
+:ref:`\_\_has\_feature <langext-__has_feature-__has_extension>` can be used for
+this purpose.
+
+.. code-block:: c
+
+    #if defined(__has_feature)
+    #  if __has_feature(type_sanitizer)
+    // code that builds only under TypeSanitizer
+    #  endif
+    #endif
+
+``__attribute__((no_sanitize("type")))``
+-----------------------------------------------
+
+Some code you may not want to be instrumented by TypeSanitizer.  One may use the
+function attribute ``no_sanitize("type")`` to disable instrumenting type aliasing. 
+Its possible, depending on what happens in non-instrumented code, that instrumented code 
+emits false-positives/ false-negatives. This attribute may not be supported by other 
+compilers, so we suggest to use it together with ``__has_feature(type_sanitizer)``.
+
+``__attribute__((disable_sanitizer_instrumentation))``
+--------------------------------------------------------
+
+The ``disable_sanitizer_instrumentation`` attribute can be applied to functions
+to prevent all kinds of instrumentation. As a result, it may introduce false
+positives and incorrect stack traces. Therefore, it should be used with care,
+and only if absolutely required; for example for certain code that cannot
+tolerate any instrumentation and resulting side-effects. This attribute
+overrides ``no_sanitize("type")``.
+
+Ignorelist
+----------
+
+TypeSanitizer supports ``src`` and ``fun`` entity types in
+:doc:`SanitizerSpecialCaseList`, that can be used to suppress aliasing 
+violation reports in the specified source files or functions. Like 
+with other methods of ignoring instrumentation, this can result in false 
+positives/ false-negatives.
+
+Limitations
+-----------
+
+* TypeSanitizer uses more real memory than a native run. It uses 8 bytes of
+  shadow memory for each byte of user memory.
+* There are transformation passes which run before TypeSanitizer. If these 
+  passes optimize out an aliasing violation, TypeSanitizer cannot catch it.
+* Currently, all instrumentation is inlined. This can result in a **15x** 
+  (on average) increase in generated file size, and **3x** to **7x** increase 
+  in compile time. In some documented cases this can cause the compiler to hang.
+  A fix for this is in the last stages of release.
+* Codebases that use unions and struct-initialized variables can see incorrect 
+  results, as TypeSanitizer doesn't yet instrument these reliably.
+
+Current Status
+--------------
+
+TypeSanitizer is brand new, and still in development. There are some known 
+issues, especially in areas where clang doesn't generate valid TBAA metadata. 
+
+We are actively working on enhancing the tool --- stay tuned.  Any help, 
+issues, pull requests, ideas, is more than welcome.

>From 5c9d8f8176ebcf1bd3f1ef49ffb0e685c50d0749 Mon Sep 17 00:00:00 2001
From: gbMattN <matthew.nagy at sony.com>
Date: Mon, 20 Jan 2025 11:41:35 +0000
Subject: [PATCH 02/12] Tweaks and edits

---
 clang/docs/TypeSanitizer.rst | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
index 6b320f3bb1773d..ceb2fca37df904 100644
--- a/clang/docs/TypeSanitizer.rst
+++ b/clang/docs/TypeSanitizer.rst
@@ -33,8 +33,7 @@ Usage
 Compile and link your program with ``-fsanitize=type`` flag.  The
 TypeSanitizer run-time library should be linked to the final executable, so
 make sure to use ``clang`` (not ``ld``) for the final link step. To
-get a reasonable performance add ``-O1`` or higher 
-(`This may currently lead to false-negatives <https://github.com/llvm/llvm-project/issues/120855>`). 
+get a reasonable performance add ``-O1`` or higher.
 TypeSanitizer by default doesn't print the full stack trace on error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1`` 
 to print the full trace. To get nicer stack traces in error messages add ``-fno-omit-frame-pointer`` and 
 ``-g``.  To get perfect stack traces you may need to disable inlining (just use ``-O1``) and tail call elimination 
@@ -70,8 +69,9 @@ run.
 Error terminology
 ------------------
 
-There are some terms that may appear in TypeSanitizer errors that are derived from TBAA Metadata. This 
-section hopes to provide a brief dictionary of these terms.
+There are some terms that may appear in TypeSanitizer errors that are derived from 
+`TBAA Metadata <https://llvm.org/docs/LangRef.html#tbaa-metadata>`. This section hopes to provide a 
+brief dictionary of these terms.
 
 * ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++ 
   type ``char``.
@@ -105,7 +105,7 @@ this purpose.
 
 Some code you may not want to be instrumented by TypeSanitizer.  One may use the
 function attribute ``no_sanitize("type")`` to disable instrumenting type aliasing. 
-Its possible, depending on what happens in non-instrumented code, that instrumented code 
+It is possible, depending on what happens in non-instrumented code, that instrumented code 
 emits false-positives/ false-negatives. This attribute may not be supported by other 
 compilers, so we suggest to use it together with ``__has_feature(type_sanitizer)``.
 
@@ -138,7 +138,7 @@ Limitations
 * Currently, all instrumentation is inlined. This can result in a **15x** 
   (on average) increase in generated file size, and **3x** to **7x** increase 
   in compile time. In some documented cases this can cause the compiler to hang.
-  A fix for this is in the last stages of release.
+  There are plans to improve this in the future.
 * Codebases that use unions and struct-initialized variables can see incorrect 
   results, as TypeSanitizer doesn't yet instrument these reliably.
 

>From 3645fc18e198d0642543b002f1853e983dab1b65 Mon Sep 17 00:00:00 2001
From: gbMattN <matthew.nagy at sony.com>
Date: Mon, 20 Jan 2025 15:17:54 +0000
Subject: [PATCH 03/12] Fixed error in code block

---
 clang/docs/TypeSanitizer.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
index ceb2fca37df904..20d0fc71775237 100644
--- a/clang/docs/TypeSanitizer.rst
+++ b/clang/docs/TypeSanitizer.rst
@@ -57,6 +57,7 @@ The program won't terminate, which will allow you to detect many strict aliasing
 run.
 
 .. code-block:: console
+
     % ./a.out
     ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1145ff41 bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
     READ of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int

>From 3b27cf7b653b52d89d669db7b59f96a0ea719d03 Mon Sep 17 00:00:00 2001
From: gbMattN <matthew.nagy at sony.com>
Date: Mon, 20 Jan 2025 15:31:01 +0000
Subject: [PATCH 04/12] Add TySan links to other doc pages

---
 clang/docs/UsersManual.rst | 3 +++
 clang/docs/index.rst       | 1 +
 2 files changed, 4 insertions(+)

diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 260e84910c6f78..a56c9425ebb757 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2103,7 +2103,10 @@ are listed below.
 
       ``-fsanitize=undefined``: :doc:`UndefinedBehaviorSanitizer`,
       a fast and compatible undefined behavior checker.
+   -  .. _opt_fsanitize_type:
 
+      ``-fsanitize=type``: :doc:`TypeSanitizer`, a detector for strict
+      aliasing violations.
    -  ``-fsanitize=dataflow``: :doc:`DataFlowSanitizer`, a general data
       flow analysis.
    -  ``-fsanitize=cfi``: :doc:`control flow integrity <ControlFlowIntegrity>`
diff --git a/clang/docs/index.rst b/clang/docs/index.rst
index cc070059eede5d..26cc08e23a5762 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -35,6 +35,7 @@ Using Clang as a Compiler
    UndefinedBehaviorSanitizer
    DataFlowSanitizer
    LeakSanitizer
+   TypeSanitizer
    RealtimeSanitizer
    SanitizerCoverage
    SanitizerStats

>From 8e3fbe17edbc6a8dd429743a8037b93d51deeb66 Mon Sep 17 00:00:00 2001
From: gbMattN <146744444+gbMattN at users.noreply.github.com>
Date: Mon, 20 Jan 2025 16:45:44 +0000
Subject: [PATCH 05/12] Update clang/docs/TypeSanitizer.rst

Co-authored-by: Florian Hahn <flo at fhahn.com>
---
 clang/docs/TypeSanitizer.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
index 20d0fc71775237..96855d26186ead 100644
--- a/clang/docs/TypeSanitizer.rst
+++ b/clang/docs/TypeSanitizer.rst
@@ -52,7 +52,7 @@ to print the full trace. To get nicer stack traces in error messages add ``-fno-
     # Compile and link
     % clang++ -g -fsanitize=type example_AliasViolation.cc
 
-If a strict aliasing violation is detected, the program will print an error message to stderr. 
+The program will print an error message to stderr each time a strict aliasing violation is detected. 
 The program won't terminate, which will allow you to detect many strict aliasing violations in one 
 run.
 

>From b47bb47d5187dfa8507238826bac274f996d25c3 Mon Sep 17 00:00:00 2001
From: gbMattN <matthew.nagy at sony.com>
Date: Mon, 20 Jan 2025 17:03:33 +0000
Subject: [PATCH 06/12] Touchups

---
 clang/docs/TypeSanitizer.rst | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
index 96855d26186ead..ed68690fafa7ca 100644
--- a/clang/docs/TypeSanitizer.rst
+++ b/clang/docs/TypeSanitizer.rst
@@ -9,12 +9,13 @@ Introduction
 ============
 
 TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
-instrumentation module and a run-time library. The tool detects violations such as the use 
-of an illegally cast pointer, or misuse of a union.
+instrumentation module and a run-time library. The tool detects violations where you access 
+memory under a different type than the dynamic type of the object.
 
 The violations TypeSanitizer catches may cause the compiler to emit incorrect code.
 
-Typical slowdown introduced by TypeSanitizer is about **4x** [[CHECK THIS]]. Typical memory overhead introduced by TypeSanitizer is about **9x**. 
+As TypeSanitizer is still experimental, it can currently have a large impact on runtime speed, 
+memory use, and code size.
 
 How to build
 ============
@@ -76,11 +77,11 @@ brief dictionary of these terms.
 
 * ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++ 
   type ``char``.
-* ``type p[x]``: Sometimes a program could generate distinct TBAA metadata that resolve to the same name. 
-  To make them unique, they have the character 'p' and a number prepended to their name.
+* ``type p[x]``: This signifies pointers to the type. x is the number of indirections to reach the final value.
+  As an example, a pointer to a pointer to an integer would be ``type p2 int``.
 
-These terms are a result of non-user-facing processes, and not always self-explanatory. There is some 
-interest in changing TypeSanitizer in the future to translate these terms before printing them to users.
+TypeSanitizer is still experimental. User-facing error messages should be improved in the future to remove 
+references to LLVM IR specific terms.
 
 Sanitizer features
 ==================
@@ -147,7 +148,9 @@ Current Status
 --------------
 
 TypeSanitizer is brand new, and still in development. There are some known 
-issues, especially in areas where clang doesn't generate valid TBAA metadata. 
+issues, especially in areas where Clang's emitted TBAA data isn't extensive 
+enough for TypeSanitizer's runtime.
 
 We are actively working on enhancing the tool --- stay tuned.  Any help, 
-issues, pull requests, ideas, is more than welcome.
+issues, pull requests, ideas, is more than welcome. You can find the 
+`issue tracker here.<https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aopen%20TySan%20label%3Acompiler-rt%3Atysan>`

>From 9cc3aa3d4f7e08e3b9bc742b1087f1330f1639e2 Mon Sep 17 00:00:00 2001
From: gbMattN <146744444+gbMattN at users.noreply.github.com>
Date: Tue, 21 Jan 2025 16:38:14 +0000
Subject: [PATCH 07/12] Apply suggestions from code review

Co-authored-by: Erich Keane <ekeane at nvidia.com>
---
 clang/docs/TypeSanitizer.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
index ed68690fafa7ca..19baf6a792f00a 100644
--- a/clang/docs/TypeSanitizer.rst
+++ b/clang/docs/TypeSanitizer.rst
@@ -8,7 +8,7 @@ TypeSanitizer
 Introduction
 ============
 
-TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
+The TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
 instrumentation module and a run-time library. The tool detects violations where you access 
 memory under a different type than the dynamic type of the object.
 
@@ -35,7 +35,7 @@ Compile and link your program with ``-fsanitize=type`` flag.  The
 TypeSanitizer run-time library should be linked to the final executable, so
 make sure to use ``clang`` (not ``ld``) for the final link step. To
 get a reasonable performance add ``-O1`` or higher.
-TypeSanitizer by default doesn't print the full stack trace on error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1`` 
+TypeSanitizer by default doesn't print the full stack trace in error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1`` 
 to print the full trace. To get nicer stack traces in error messages add ``-fno-omit-frame-pointer`` and 
 ``-g``.  To get perfect stack traces you may need to disable inlining (just use ``-O1``) and tail call elimination 
 (``-fno-optimize-sibling-calls``).

>From e55c025d51e13a7808c7e6090864327be8aafb5a Mon Sep 17 00:00:00 2001
From: gbMattN <matthew.nagy at sony.com>
Date: Tue, 21 Jan 2025 17:00:13 +0000
Subject: [PATCH 08/12] Expanded the section on the point of TySan

---
 clang/docs/TypeSanitizer.rst | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
index 19baf6a792f00a..c19356656f9ebd 100644
--- a/clang/docs/TypeSanitizer.rst
+++ b/clang/docs/TypeSanitizer.rst
@@ -9,10 +9,16 @@ Introduction
 ============
 
 The TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
-instrumentation module and a run-time library. The tool detects violations where you access 
-memory under a different type than the dynamic type of the object.
+instrumentation module and a run-time library. C/C++ has type-based aliasing rules, and LLVM 
+can exploit these for optimizations given the TBAA metadata Clang emits. In general, a pointer 
+of a given type cannot access an object of a different type, with only a few exceptions. 
 
-The violations TypeSanitizer catches may cause the compiler to emit incorrect code.
+These rules aren't always apparent to users, which leads to code that violates these rules
+(e.g. for type punning). This can lead to optimization passes introducing bugs unless the 
+code is build with ``-fno-strict-aliasing``, sacrificing performance.
+
+TypeSanitizer is built to catch when these strict aliasing rules have been violated, helping 
+users find where such bugs originate in their code despite the code looking valid at first glance.
 
 As TypeSanitizer is still experimental, it can currently have a large impact on runtime speed, 
 memory use, and code size.

>From 9204fc8c3598957cfbf51fccc4501f8f99516adb Mon Sep 17 00:00:00 2001
From: gbMattN <146744444+gbMattN at users.noreply.github.com>
Date: Thu, 23 Jan 2025 10:37:37 +0000
Subject: [PATCH 09/12] Apply suggestions from code review

Co-authored-by: Aaron Ballman <aaron at aaronballman.com>
---
 clang/docs/TypeSanitizer.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
index c19356656f9ebd..33498791f1f5d7 100644
--- a/clang/docs/TypeSanitizer.rst
+++ b/clang/docs/TypeSanitizer.rst
@@ -1,6 +1,6 @@
-================
+=============
 TypeSanitizer
-================
+=============
 
 .. contents::
    :local:
@@ -59,7 +59,7 @@ to print the full trace. To get nicer stack traces in error messages add ``-fno-
     # Compile and link
     % clang++ -g -fsanitize=type example_AliasViolation.cc
 
-The program will print an error message to stderr each time a strict aliasing violation is detected. 
+The program will print an error message to ``stderr`` each time a strict aliasing violation is detected. 
 The program won't terminate, which will allow you to detect many strict aliasing violations in one 
 run.
 
@@ -83,7 +83,7 @@ brief dictionary of these terms.
 
 * ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++ 
   type ``char``.
-* ``type p[x]``: This signifies pointers to the type. x is the number of indirections to reach the final value.
+* ``type p[x]``: This signifies pointers to the type. ``x`` is the number of indirections to reach the final value.
   As an example, a pointer to a pointer to an integer would be ``type p2 int``.
 
 TypeSanitizer is still experimental. User-facing error messages should be improved in the future to remove 

>From 8ed3f107a1d7698ef404f2421ca107fd05eb9793 Mon Sep 17 00:00:00 2001
From: gbMattN <matthew.nagy at sony.com>
Date: Thu, 23 Jan 2025 12:12:29 +0000
Subject: [PATCH 10/12] Added notes on which aliasing rules TySan follows and
 how it interacts with other sanitizers

---
 clang/docs/TypeSanitizer.rst | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
index 33498791f1f5d7..f111e86151aa25 100644
--- a/clang/docs/TypeSanitizer.rst
+++ b/clang/docs/TypeSanitizer.rst
@@ -37,7 +37,7 @@ for the use/testing of TypeSanitizer:
 Usage
 =====
 
-Compile and link your program with ``-fsanitize=type`` flag.  The
+Compile and link your program with ``-fsanitize=type`` flag. The
 TypeSanitizer run-time library should be linked to the final executable, so
 make sure to use ``clang`` (not ``ld``) for the final link step. To
 get a reasonable performance add ``-O1`` or higher.
@@ -149,6 +149,13 @@ Limitations
   There are plans to improve this in the future.
 * Codebases that use unions and struct-initialized variables can see incorrect 
   results, as TypeSanitizer doesn't yet instrument these reliably.
+* Since Clang & LLVM's TBAA system is used to generate the checks used by the 
+  instrumentation, TypeSanitizer follows Clang & LLVM's rules for type aliasing. 
+  There may be situations where that disagrees with the standard. However this 
+  does at least mean that TypeSanitizer will catch any aliasing violations that  
+  would cause bugs when compiling with Clang & LLVM.
+* TypeSanitizer cannot currently be run alongside other sanitizers such as 
+  AddressSanitizer, ThreadSanitizer or UndefinedBehaviourSanitizer.
 
 Current Status
 --------------

>From 6b424a6e7329a723ead2530857f84c3041bc06b1 Mon Sep 17 00:00:00 2001
From: gbMattN <matthew.nagy at sony.com>
Date: Fri, 24 Jan 2025 10:27:56 +0000
Subject: [PATCH 11/12] Add note about compilation speed

---
 clang/docs/TypeSanitizer.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
index f111e86151aa25..eb104a8a5c7fe0 100644
--- a/clang/docs/TypeSanitizer.rst
+++ b/clang/docs/TypeSanitizer.rst
@@ -21,7 +21,8 @@ TypeSanitizer is built to catch when these strict aliasing rules have been viola
 users find where such bugs originate in their code despite the code looking valid at first glance.
 
 As TypeSanitizer is still experimental, it can currently have a large impact on runtime speed, 
-memory use, and code size.
+memory use, and code size. It also has a large compile-time overhead. Work is being done to 
+reduce these impacts.
 
 How to build
 ============

>From a27e1b7779760a121741c92f50027e0099ec5f04 Mon Sep 17 00:00:00 2001
From: gbMattN <matthew.nagy at sony.com>
Date: Tue, 28 Jan 2025 16:10:52 +0000
Subject: [PATCH 12/12] Added section on the tysan algorithm

---
 clang/docs/TypeSanitizer.rst | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
index eb104a8a5c7fe0..8b815d8804fa8f 100644
--- a/clang/docs/TypeSanitizer.rst
+++ b/clang/docs/TypeSanitizer.rst
@@ -24,6 +24,41 @@ As TypeSanitizer is still experimental, it can currently have a large impact on
 memory use, and code size. It also has a large compile-time overhead. Work is being done to 
 reduce these impacts.
 
+The TypeSanitizer Algorithm
+===========================
+For each TBAA type-access descriptor, encoded in LLVM IR using TBAA Metadata, the instrumentation 
+pass generates descriptor tales. Thus there is a unique pointer to each type (and access descriptor).
+These tables are comdat (except for anonymous-namespace types), so the pointer values are unique 
+across the program.
+
+The descriptors refer to other descriptors to form a type aliasing tree, like how LLVM's TBAA data 
+does.
+
+The runtime uses 8 bytes of shadow memory, the size of the pointer to the type descriptor, for 
+every byte of accessed data in the program. The first byte of a type will have its shadow memory 
+be set to the pointer to its type descriptor. Aside from that, there are some other values it may be.
+
+* 0 is used to represent an unknown type
+* Negative numbers represent an interior byte: A byte inside a type that is not the first one. As an 
+  example, a value of -2 means you are in the third byte of a type.
+
+The Instrumentation first checks for an exact match between the type of the current access and the 
+type for that address in the shadow memory. This can quickly be done by checking pointer values. If 
+it matches, it checks the remaining shadow memory of the type to ensure they are the correct negative 
+numbers. If this fails, it calls the "slow path" check. If the exact match fails, we check to see if 
+the value, and the remainder of the shadow bytes, is 0. If they are, we can set the shadow memory to 
+the correct type descriptor pointer for the first byte, and the correct negative numbers for the rest 
+of the type's shadow.
+
+If the type in shadow memory is neither an exact match nor 0, we call the slower runtime check. It 
+uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to 
+alias.
+
+The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, 
+memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory 
+to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the 
+same function for the library calls.
+
 How to build
 ============
 



More information about the cfe-commits mailing list