[compiler-rt] [llvm] [TSan] Add dominance-based redundant instrumentation elimination (PR #169897)

Fri Nov 28 02:42:55 PST 2025

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: Alexey Paznikov (apaznikov)

<details>
<summary>Changes</summary>

## Summary
This PR implements a static analysis pass to identify and eliminate redundant memory access instrumentation in ThreadSanitizer. By leveraging dominance and post-dominance relationships within the Control Flow Graph (CFG), we can prove that certain runtime checks are unnecessary because a data race would inevitably be detected by a preceding or succeeding check on the same execution path.

This work is part of a broader research effort on optimizing dynamic race detectors [1].

## Implementation Details
The logic is encapsulated in the `DominanceBasedElimination` class within `ThreadSanitizer.cpp`. The pass operates intra-procedurally and performs the following steps:

1.  **Safety Analysis:** It builds a cache of "safe" basic blocks and paths. A path is considered safe if it contains no synchronization primitives (atomics, fences) and no function calls that could potentially synchronize threads (checked via function attributes in this patch).
2.  **Dominance Elimination:** If instruction $I_{dom}$ dominates $I_{sub}$, both access the same memory location (verified via `MustAlias`), and the path $I_{dom} \to I_{sub}$ is safe, then $I_{sub}$ is not instrumented.
3.  **Post-Dominance Elimination:** If instruction $I_{post}$ post-dominates $I_{pre}$, accesses the same location, and the path $I_{pre} \to I_{post}$ is safe, then $I_{pre}$ is not instrumented.

### Safety & Correctness
*   **Aliasing:** We strictly require `MustAlias`. If AA returns `MayAlias` or `NoAlias`, optimization is skipped.
*   **Loops:** For post-dominance, we disable elimination if the path passes through a loop header to avoid removing checks in code that might enter an infinite loop before reaching the post-dominator (we also do this if path contains calls, because functions may contain infinite loops). We also check whether along the path there are no instructions that could cause an irregular exit.
*   **Write/Read semantics:** The logic respects TSan's read/write semantics (e.g., a Write can eliminate a subsequent Read, but a Read cannot eliminate a subsequent Write).

## Impact
*   **Runtime Performance:** Reduces the number of `__tsan_read/write` calls, leading to lower runtime overhead for instrumented binaries.
*   **Report Granularity:** In some cases, the reported race location might shift from the "second" access to the "first" (dominating) access. However, the presence of a race is guaranteed to be reported.
*   **Compile Time:** The analysis uses standard LLVM passes (`DominatorTree`, `PostDominatorTree`, `LoopInfo`) and is lightweight.

## Motivation & Potential Impact
This work is based on our research [1] into optimizing dynamic race detectors. Our experiments identified **Dominance-based Elimination (DE)** as the single most effective optimization strategy compared to other techniques (Escape Analysis, Lock Ownership, etc.).

In our research prototype (which utilizes a more aggressive, inter-procedural version of this analysis), we observed the following speedups solely from DE:

*   **SQLite:** ~1.67x speedup
*   **Redis:** ~1.35x speedup
*   **FFmpeg:** ~1.2x speedup
*   **MySQL:** ~1.13x speedup
*   **Memcached:** ~1.07x speedup

**Chromium:**
We evaluated the pass on **Chromium** using a suite of micro-benchmarks (including Layout, Parser, SVG, and Speedometer tests).
*   **Median Speedup:** ~1.3x across all suites.
*   **Distribution:** Over **80%** of all Chromium micro-benchmarks achieved a speedup of at least **1.2x**.
*   Specific suites like 'Layout' showed median speedups around **1.5x**.

### Note on this PR:
This patch implements a **conservative, intra-procedural** version of the algorithm described in [1] to ensure maximum stability and soundness for the upstream compiler. While the absolute speedups for this initial version may be lower than the research prototype (due to the lack of IPA and conservative handling of synchronization), it targets the same redundancy patterns and serves as the foundational infrastructure for future improvements.

Even with intra-procedural analysis, we expect significant instrumentation reduction in hot loops and straight-line code with repeated accesses.

## Usage
The optimization is currently opt-in.
**Flag:** `-mllvm -tsan-use-dominance-analysis`

## Attribution & Status
**Implementation:**
This patch was implemented by **Alexey Paznikov**.

**Research & Algorithm Design:**
The underlying algorithms and performance validation were conducted by the research team: **Alexey Paznikov**, **Andrey Kogutenko**, **Yaroslav Osipov**, **Michael Schwarz**, and **Umang Mathur**.

This work is based on research currently **under review** for publication [1].

[1] "Optimizing Instrumentation for Data Race Detectors" (Under Review, 2025).

---

Patch is 54.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169897.diff


5 Files Affected:

- (modified) compiler-rt/test/tsan/CMakeLists.txt (+81-30) 
- (modified) compiler-rt/test/tsan/lit.cfg.py (+8) 
- (modified) compiler-rt/test/tsan/lit.site.cfg.py.in (+3) 
- (modified) llvm/lib/Transforms/Instrumentation/ThreadSanitizer.cpp (+613-29) 
- (added) llvm/test/Instrumentation/ThreadSanitizer/dominance-elimination.ll (+540) 


``````````diff

diff --git a/compiler-rt/test/tsan/CMakeLists.txt b/compiler-rt/test/tsan/CMakeLists.txt
index 163355d68ebc2..e2f297a397625 100644
--- a/compiler-rt/test/tsan/CMakeLists.txt
+++ b/compiler-rt/test/tsan/CMakeLists.txt
@@ -18,6 +18,7 @@ endif()
 set(TSAN_DYNAMIC_TEST_DEPS ${TSAN_TEST_DEPS})
 set(TSAN_TESTSUITES)
 set(TSAN_DYNAMIC_TESTSUITES)
+set(TSAN_ENABLE_DOMINANCE_ANALYSIS "False") # Disable dominance analysis by default
 
 if (NOT DEFINED TSAN_TEST_DEFLAKE_THRESHOLD)
   set(TSAN_TEST_DEFLAKE_THRESHOLD "10")
@@ -28,45 +29,77 @@ if(APPLE)
   darwin_filter_host_archs(TSAN_SUPPORTED_ARCH TSAN_TEST_ARCH)
 endif()
 
-foreach(arch ${TSAN_TEST_ARCH})
-  set(TSAN_TEST_APPLE_PLATFORM "osx")
-  set(TSAN_TEST_MIN_DEPLOYMENT_TARGET_FLAG "${DARWIN_osx_MIN_VER_FLAG}")
+# Unified function for generating TSAN test suites by architectures.
+# Arguments:
+#   OUT_LIST_VAR    - name of output list (for example, TSAN_TESTSUITES or TSAN_DOM_TESTSUITES)
+#   SUFFIX_KIND     - string added to config suffix after "-${arch}" (for example, "" or "-dominance")
+#   CONFIG_KIND     - string added to config name after "Config" (for example, "" or "Dominance")
+#   ENABLE_DOM       - "True"/"False" enable dominance analysis
+function(tsan_generate_arch_suites OUT_LIST_VAR SUFFIX_KIND CONFIG_KIND ENABLE_DOM)
+  foreach(arch ${TSAN_TEST_ARCH})
+    set(TSAN_ENABLE_DOMINANCE_ANALYSIS "${ENABLE_DOM}")
 
-  set(TSAN_TEST_TARGET_ARCH ${arch})
-  string(TOLOWER "-${arch}" TSAN_TEST_CONFIG_SUFFIX)
-  get_test_cc_for_arch(${arch} TSAN_TEST_TARGET_CC TSAN_TEST_TARGET_CFLAGS)
+    set(TSAN_TEST_APPLE_PLATFORM "osx")
+    set(TSAN_TEST_MIN_DEPLOYMENT_TARGET_FLAG "${DARWIN_osx_MIN_VER_FLAG}")
 
-  string(REPLACE ";" " " LIBDISPATCH_CFLAGS_STRING " ${COMPILER_RT_TEST_LIBDISPATCH_CFLAGS}")
-  string(APPEND TSAN_TEST_TARGET_CFLAGS ${LIBDISPATCH_CFLAGS_STRING})
+    set(TSAN_TEST_TARGET_ARCH ${arch})
+    string(TOLOWER "-${arch}${SUFFIX_KIND}" TSAN_TEST_CONFIG_SUFFIX)
+    get_test_cc_for_arch(${arch} TSAN_TEST_TARGET_CC TSAN_TEST_TARGET_CFLAGS)
 
-  if (COMPILER_RT_HAS_MSSE4_2_FLAG)
-    string(APPEND TSAN_TEST_TARGET_CFLAGS " -msse4.2 ")
-  endif()
+    string(REPLACE ";" " " LIBDISPATCH_CFLAGS_STRING " ${COMPILER_RT_TEST_LIBDISPATCH_CFLAGS}")
+    string(APPEND TSAN_TEST_TARGET_CFLAGS ${LIBDISPATCH_CFLAGS_STRING})
 
-  string(TOUPPER ${arch} ARCH_UPPER_CASE)
-  set(CONFIG_NAME ${ARCH_UPPER_CASE}Config)
+    if (COMPILER_RT_HAS_MSSE4_2_FLAG)
+      string(APPEND TSAN_TEST_TARGET_CFLAGS " -msse4.2 ")
+    endif()
 
-  configure_lit_site_cfg(
-    ${CMAKE_CURRENT_SOURCE_DIR}/lit.site.cfg.py.in
-    ${CMAKE_CURRENT_BINARY_DIR}/${CONFIG_NAME}/lit.site.cfg.py
-    MAIN_CONFIG
-    ${CMAKE_CURRENT_SOURCE_DIR}/lit.cfg.py
-    )
-  list(APPEND TSAN_TESTSUITES ${CMAKE_CURRENT_BINARY_DIR}/${CONFIG_NAME})
+    string(TOUPPER ${arch} ARCH_UPPER_CASE)
+    set(CONFIG_NAME ${ARCH_UPPER_CASE}Config${CONFIG_KIND})
 
-  if(COMPILER_RT_TSAN_HAS_STATIC_RUNTIME)
-    string(TOLOWER "-${arch}-${OS_NAME}-dynamic" TSAN_TEST_CONFIG_SUFFIX)
-    set(CONFIG_NAME ${ARCH_UPPER_CASE}${OS_NAME}DynamicConfig)
     configure_lit_site_cfg(
-      ${CMAKE_CURRENT_SOURCE_DIR}/lit.site.cfg.py.in
-      ${CMAKE_CURRENT_BINARY_DIR}/${CONFIG_NAME}/lit.site.cfg.py
-      MAIN_CONFIG
-      ${CMAKE_CURRENT_SOURCE_DIR}/lit.cfg.py
+            ${CMAKE_CURRENT_SOURCE_DIR}/lit.site.cfg.py.in
+            ${CMAKE_CURRENT_BINARY_DIR}/${CONFIG_NAME}/lit.site.cfg.py
+            MAIN_CONFIG
+            ${CMAKE_CURRENT_SOURCE_DIR}/lit.cfg.py
+    )
+    list(APPEND ${OUT_LIST_VAR} ${CMAKE_CURRENT_BINARY_DIR}/${CONFIG_NAME})
+
+    if(COMPILER_RT_TSAN_HAS_STATIC_RUNTIME)
+      # Dynamic runtime for corresponding variant
+      if("${SUFFIX_KIND}" STREQUAL "")
+        string(TOLOWER "-${arch}-${OS_NAME}-dynamic" TSAN_TEST_CONFIG_SUFFIX)
+        set(CONFIG_NAME ${ARCH_UPPER_CASE}${OS_NAME}DynamicConfig${CONFIG_KIND})
+        list(APPEND TSAN_DYNAMIC_TESTSUITES ${CMAKE_CURRENT_BINARY_DIR}/${CONFIG_NAME})
+      else()
+        string(TOLOWER "-${arch}-${OS_NAME}-dynamic${SUFFIX_KIND}" TSAN_TEST_CONFIG_SUFFIX)
+        set(CONFIG_NAME ${ARCH_UPPER_CASE}${OS_NAME}DynamicConfig${CONFIG_KIND})
+        # Track dynamic dominance-analysis suites separately for a dedicated target.
+        list(APPEND TSAN_DOM_DYNAMIC_TESTSUITES ${CMAKE_CURRENT_BINARY_DIR}/${CONFIG_NAME})
+      endif()
+      configure_lit_site_cfg(
+              ${CMAKE_CURRENT_SOURCE_DIR}/lit.site.cfg.py.in
+              ${CMAKE_CURRENT_BINARY_DIR}/${CONFIG_NAME}/lit.site.cfg.py
+              MAIN_CONFIG
+              ${CMAKE_CURRENT_SOURCE_DIR}/lit.cfg.py
       )
-    list(APPEND TSAN_DYNAMIC_TESTSUITES
-      ${CMAKE_CURRENT_BINARY_DIR}/${CONFIG_NAME})
+      list(APPEND ${OUT_LIST_VAR} ${CMAKE_CURRENT_BINARY_DIR}/${CONFIG_NAME})
+    endif()
+  endforeach()
+
+  # Propagate the assembled list to the parent scope
+  set(${OUT_LIST_VAR} "${${OUT_LIST_VAR}}" PARENT_SCOPE)
+  if(DEFINED TSAN_DOM_DYNAMIC_TESTSUITES)
+    set(TSAN_DOM_DYNAMIC_TESTSUITES "${TSAN_DOM_DYNAMIC_TESTSUITES}" PARENT_SCOPE)
   endif()
-endforeach()
+endfunction()
+
+# Default configuration
+set(TSAN_TESTSUITES)
+tsan_generate_arch_suites(TSAN_TESTSUITES "" "" "False")
+
+# Enable dominance analysis (check-tsan-dominance-analysis target)
+set(TSAN_DOM_TESTSUITES)
+tsan_generate_arch_suites(TSAN_DOM_TESTSUITES "-dominance" "Dominance" "True")
 
 # iOS and iOS simulator test suites
 # These are not added into "check-all", in order to run these tests, use
@@ -124,6 +157,10 @@ list(APPEND TSAN_TESTSUITES ${CMAKE_CURRENT_BINARY_DIR}/Unit)
 if(COMPILER_RT_TSAN_HAS_STATIC_RUNTIME)
   list(APPEND TSAN_DYNAMIC_TESTSUITES ${CMAKE_CURRENT_BINARY_DIR}/Unit/dynamic)
 endif()
+list(APPEND TSAN_DOM_TESTSUITES ${CMAKE_CURRENT_BINARY_DIR}/Unit)
+if(COMPILER_RT_TSAN_HAS_STATIC_RUNTIME)
+  list(APPEND TSAN_DOM_DYNAMIC_TESTSUITES ${CMAKE_CURRENT_BINARY_DIR}/Unit/dynamic)
+endif()
 
 add_lit_testsuite(check-tsan "Running ThreadSanitizer tests"
   ${TSAN_TESTSUITES}
@@ -136,3 +173,17 @@ if(COMPILER_RT_TSAN_HAS_STATIC_RUNTIME)
                     EXCLUDE_FROM_CHECK_ALL
                     DEPENDS ${TSAN_DYNAMIC_TEST_DEPS})
 endif()
+
+add_lit_testsuite(check-tsan-dominance-analysis "Running ThreadSanitizer tests (dominance analysis)"
+        ${TSAN_DOM_TESTSUITES}
+        DEPENDS ${TSAN_TEST_DEPS})
+set_target_properties(check-tsan-dominance-analysis PROPERTIES FOLDER "Compiler-RT Tests")
+
+# New target: dynamic + dominance analysis
+if(COMPILER_RT_TSAN_HAS_STATIC_RUNTIME)
+  add_lit_testsuite(check-tsan-dominance-analysis-dynamic "Running ThreadSanitizer tests (dynamic, dominance analysis)"
+          ${TSAN_DOM_DYNAMIC_TESTSUITES}
+          EXCLUDE_FROM_CHECK_ALL
+          DEPENDS ${TSAN_DYNAMIC_TEST_DEPS})
+  set_target_properties(check-tsan-dominance-analysis-dynamic PROPERTIES FOLDER "Compiler-RT Tests")
+endif()
diff --git a/compiler-rt/test/tsan/lit.cfg.py b/compiler-rt/test/tsan/lit.cfg.py
index 8803a7bda9aa5..1dfc8c8557cbb 100644
--- a/compiler-rt/test/tsan/lit.cfg.py
+++ b/compiler-rt/test/tsan/lit.cfg.py
@@ -56,6 +56,14 @@ def get_required_attr(config, attr_name):
     + extra_cflags
     + ["-I%s" % tsan_incdir]
 )
+
+# Setup dominance-based elimination if enabled
+tsan_enable_dominance = getattr(config, "tsan_enable_dominance_analysis", "False") == "True"
+if tsan_enable_dominance:
+    config.name += " (dominance-analysis)"
+    dom_flags = [ "-mllvm", "-tsan-use-dominance-analysis" ]
+    clang_tsan_cflags += dom_flags
+
 clang_tsan_cxxflags = (
     config.cxx_mode_flags + clang_tsan_cflags + ["-std=c++11"] + ["-I%s" % tsan_incdir]
 )
diff --git a/compiler-rt/test/tsan/lit.site.cfg.py.in b/compiler-rt/test/tsan/lit.site.cfg.py.in
index c6d453aaee26f..d1d265e0ec53e 100644
--- a/compiler-rt/test/tsan/lit.site.cfg.py.in
+++ b/compiler-rt/test/tsan/lit.site.cfg.py.in
@@ -9,6 +9,9 @@ config.target_cflags = "@TSAN_TEST_TARGET_CFLAGS@"
 config.target_arch = "@TSAN_TEST_TARGET_ARCH@"
 config.deflake_threshold = "@TSAN_TEST_DEFLAKE_THRESHOLD@"
 
+# Enable dominance analysis.
+config.tsan_enable_dominance_analysis = "@TSAN_ENABLE_DOMINANCE_ANALYSIS@"
+
 # Load common config for all compiler-rt lit tests.
 lit_config.load_config(config, "@COMPILER_RT_BINARY_DIR@/test/lit.common.configured")
 
diff --git a/llvm/lib/Transforms/Instrumentation/ThreadSanitizer.cpp b/llvm/lib/Transforms/Instrumentation/ThreadSanitizer.cpp
index fd0e9f18b61c9..5061c65a1ac0f 100644
--- a/llvm/lib/Transforms/Instrumentation/ThreadSanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/ThreadSanitizer.cpp
@@ -24,10 +24,14 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringExtras.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/Analysis/CaptureTracking.h"
+#include "llvm/Analysis/LoopInfo.h"
+#include "llvm/Analysis/PostDominators.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/DataLayout.h"
+#include "llvm/IR/Dominators.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/Instructions.h"
@@ -84,6 +88,15 @@ static cl::opt<bool>
     ClOmitNonCaptured("tsan-omit-by-pointer-capturing", cl::init(true),
                       cl::desc("Omit accesses due to pointer capturing"),
                       cl::Hidden);
+static cl::opt<bool>
+    ClUseDominanceAnalysis("tsan-use-dominance-analysis", cl::init(false),
+                           cl::desc("Eliminate duplicating instructions which "
+                                    "(post)dominate given instruction"),
+                           cl::Hidden);
+static cl::opt<bool> ClPostDomAggressive(
+    "tsan-postdom-aggressive", cl::init(false),
+    cl::desc("Allow post-dominance elimination across loops (unsafe)"),
+    cl::Hidden);
 
 STATISTIC(NumInstrumentedReads, "Number of instrumented reads");
 STATISTIC(NumInstrumentedWrites, "Number of instrumented writes");
@@ -96,11 +109,172 @@ STATISTIC(NumOmittedReadsFromConstantGlobals,
           "Number of reads from constant globals");
 STATISTIC(NumOmittedReadsFromVtable, "Number of vtable reads");
 STATISTIC(NumOmittedNonCaptured, "Number of accesses ignored due to capturing");
+STATISTIC(NumOmittedByDominance,
+          "Number of accesses ignored due to dominance");
+STATISTIC(NumOmittedByPostDominance,
+          "Number of accesses ignored due to post-dominance");
 
 const char kTsanModuleCtorName[] = "tsan.module_ctor";
 const char kTsanInitName[] = "__tsan_init";
 
 namespace {
+// Internal Instruction wrapper that contains more information about the
+// Instruction from prior analysis.
+struct InstructionInfo {
+  // Instrumentation emitted for this instruction is for a compounded set of
+  // read and write operations in the same basic block.
+  static constexpr unsigned kCompoundRW = (1U << 0);
+
+  explicit InstructionInfo(Instruction *Inst) : Inst(Inst) {}
+
+  bool isWriteOperation() const {
+    return isa<StoreInst>(Inst) || (Flags & kCompoundRW);
+  }
+
+  Instruction *Inst;
+  unsigned Flags = 0;
+};
+
+/// A helper class to encapsulate the logic for eliminating redundant
+/// instrumentation based on dominance analysis.
+///
+/// This class takes a list of all accesses instructions that are candidates
+/// for instrumentation. It prunes instructions that are (post-)dominated by
+/// another access to the same memory location, provided that the path between
+/// them is "clear" of any dangerous instructions (like function calls or
+/// synchronization primitives).
+class DominanceBasedElimination {
+public:
+  /// \param AllInstr The vector of instructions to analyze. This vector is
+  ///                 modified in-place.
+  /// \param DT The Dominator Tree for the current function.
+  /// \param PDT The Post-Dominator Tree for the current function.
+  /// \param AA The results of Alias Analysis.
+  DominanceBasedElimination(SmallVectorImpl<InstructionInfo> &AllInstr,
+                            DominatorTree &DT, PostDominatorTree &PDT,
+                            AAResults &AA, LoopInfo &LI)
+      : AllInstr(AllInstr), DT(DT), PDT(PDT), AA(AA), LI(LI) {
+    // Build per-function basic-block safety cache once
+    if (!AllInstr.empty() && AllInstr.front().Inst) {
+      Function *F = AllInstr.front().Inst->getFunction();
+      BSC.ReachableToEnd.reserve(F->size());
+      BSC.ConeSafeCache.reserve(F->size());
+      buildBlockSafetyCache(*F);
+    }
+  }
+
+  /// Runs the analysis and prunes redundant instructions.
+  /// It sequentially applies elimination based on dominance and post-dominance.
+  void run() {
+    eliminate</*IsPostDom=*/false>(); // Dominance-based elimination
+    eliminate</*IsPostDom=*/true>();  // Post-dominance-based elimination
+  }
+
+private:
+  /// Per-function precomputation cache: instruction indices within BB and
+  /// positions of "dangerous" instructions.
+  struct BlockSafetyCache {
+    DenseMap<const Instruction *, unsigned> IndexInBB;
+
+    DenseMap<const BasicBlock *, SmallVector<unsigned, 4>> DangerIdxInBB;
+    DenseMap<const BasicBlock *, bool> HasDangerInBB;
+
+    DenseMap<const BasicBlock *, SmallVector<unsigned, 4>> DangerIdxInBBPostDom;
+    DenseMap<const BasicBlock *, bool> HasDangerInBBPostDom;
+
+    // Reachability cache: a set of blocks that can reach EndBB.
+    DenseMap<const BasicBlock *, SmallPtrSet<const BasicBlock *, 32>>
+        ReachableToEnd;
+    // Cone safety cache: StartBB -> (EndBB -> pathIsSafe): to avoid custom hash
+    DenseMap<const BasicBlock *,
+             DenseMap<const BasicBlock *, std::pair<bool, bool>>>
+        ConeSafeCache;
+  } BSC;
+
+  // Reusable worklists/visited sets to amortize allocations.
+  SmallVector<const BasicBlock *, 32> Worklist;
+  SmallPtrSet<const BasicBlock *, 32> CanReachSet;
+
+  void buildBlockSafetyCache(Function &F);
+
+  /// Check that suffix (after FromIdx) in BB contains no unsafe instruction.
+  bool suffixSafe(const BasicBlock *BB, unsigned FromIdx,
+                  const DenseMap<const BasicBlock *, SmallVector<unsigned, 4>>
+                      &DangerIdxInBB) const;
+
+  /// Check that prefix (before ToIdx) in BB contains no unsafe instruction.
+  bool prefixSafe(const BasicBlock *BB, unsigned ToIdx,
+                  const DenseMap<const BasicBlock *, SmallVector<unsigned, 4>>
+                      &DangerIdxInBB) const;
+
+  /// Check that (FromIdx, ToExclusiveIdx) interval inside a single BB is safe.
+  bool intervalSafeSameBB(
+      const BasicBlock *BB, unsigned FromIdx, unsigned ToExclusiveIdx,
+      const DenseMap<const BasicBlock *, SmallVector<unsigned, 4>>
+          &DangerIdxInBB) const;
+
+  /// Checks if an instruction is "dangerous" from TSan's perspective.
+  /// Dangerous instructions include function calls, atomics, and fences.
+  ///
+  /// \param Inst The instruction to check.
+  /// \return true if the instruction is dangerous.
+  static bool isInstrSafe(const Instruction *Inst);
+
+  /// For post-dominance, need to check whether the path contains loops,
+  /// irregular exits or unsafe calls.
+  static bool isInstrSafeForPostDom(const Instruction *I);
+
+  /// Find BBs which can reach EndBB
+  SmallPtrSet<const BasicBlock *, 32> buildCanReachEnd(const BasicBlock *EndBB);
+
+  /// Forward traversal from StartBB, restricted to the cone that reach EndBB.
+  /// In post-dom mode additionally rejects paths that go through any loop BB.
+  std::pair<bool, bool> traverseReachableAndCheckSafety(
+      const BasicBlock *StartBB, const BasicBlock *EndBB,
+      const SmallPtrSetImpl<const BasicBlock *> &CanReachEnd);
+
+  /// Checks if the path between two instructions is "clear", i.e., it does not
+  /// contain any dangerous instructions that could alter the thread
+  /// synchronization state.
+  /// \param StartInst The starting instruction (dominates for Dom, is dominated
+  /// for PostDom).
+  /// \param EndInst The ending instruction (is dominated for Dom,
+  /// post-dominates for PostDom).
+  /// \param DTBase DominatorTree (for Dom) or PostDominatorTree (for PostDom).
+  /// \return true if the path is clear.
+  template <bool IsPostDom>
+  bool isPathClear(Instruction *StartInst, Instruction *EndInst,
+                   const DominatorTreeBase<BasicBlock, IsPostDom> *DTBase);
+
+  /// A helper function to create a map from Instruction* to its index
+  /// in the AllInstr vector for fast lookups.
+  DenseMap<Instruction *, size_t> createInstrToIndexMap() const;
+
+  /// Attempts to find a dominating instruction that can eliminate the need to
+  /// instrument instruction i
+  /// \param DTBase The dominator (post-dominator) tree being used
+  /// \param InstrToIndexMap Maps instructions to their indices in the AllInstr
+  /// \param ToRemove Vector tracking which instructions can be eliminated
+  /// \returns true if a dominating instruction was found that eliminates i
+  template <bool IsPostDom>
+  bool findAndMarkDominatingInstr(
+      size_t i, const DominatorTreeBase<BasicBlock, IsPostDom> *DTBase,
+      const DenseMap<Instruction *, size_t> &InstrToIndexMap,
+      SmallVectorImpl<bool> &ToRemove);
+
+  /// The core elimination logic. Templated to work with both Dominators
+  /// and Post-Dominators.
+  template <bool IsPostDom> void eliminate();
+
+  /// A reference to the vector of instructions that we modify.
+  SmallVectorImpl<InstructionInfo> &AllInstr;
+
+  /// References to the required analysis results.
+  DominatorTree &DT;
+  PostDominatorTree &PDT;
+  AAResults &AA;
+  LoopInfo &LI;
+};
 
 /// ThreadSanitizer: instrument the code in module to find races.
 ///
@@ -109,7 +283,9 @@ namespace {
 /// ensures the __tsan_init function is in the list of global constructors for
 /// the module.
 struct ThreadSanitizer {
-  ThreadSanitizer() {
+  ThreadSanitizer(const TargetLibraryInfo &TLI, DominatorTree *DT,
+                  PostDominatorTree *PDT, AAResults *AA, LoopInfo *LI)
+      : TLI(TLI), DT(DT), PDT(PDT), AA(AA), LI(LI) {
     // Check options and warn user.
     if (ClInstrumentReadBeforeWrite && ClCompoundReadBeforeWrite) {
       errs()
@@ -118,21 +294,9 @@ struct ThreadSanitizer {
     }
   }
 
-  bool sanitizeFunction(Function &F, const TargetLibraryInfo &TLI);
+  bool sanitizeFunction(Function &F);
 
 private:
-  // Internal Instruction wrapper that contains more information about the
-  // Instruction from prior analysis.
-  struct InstructionInfo {
-    // Instrumentation emitted for this instruction is for a compounded set of
-    // read and write operations in the same basic block.
-    static constexpr unsigned kCompoundRW = (1U << 0);
-
-    explicit InstructionInfo(Instruction *Inst) : Inst(Inst) {}
-
-    Instruction *Inst;
-    unsigned Flags = 0;
-  };
 
   void initialize(Module &M, const TargetLibraryInfo &TLI);
   bool instrumentLoadOrStore(const InstructionInfo &II, const DataLayout &DL);
@@ -145,6 +309,12 @@ struct ThreadSanitizer {
   int getMemoryAccessFuncIndex(Type *OrigTy, Value *Addr, const DataLayout &DL);
   void InsertRuntimeIgnores(Function &F);
 
+  const TargetLibraryInfo &TLI;
+  DominatorTree *DT = nullptr;
+  PostDominatorTree *PDT = nullptr;
+  AAResults *AA = nullptr;
+  LoopInfo *LI = nullptr;
+
   Type *IntptrTy;
   FunctionCallee TsanFuncEntry;
   FunctionCallee TsanFuncExit;
@@ -174,6 +344,413 @@ struct ThreadSanitizer {
   FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
 };
 
+//-----------------------------------------------------------------------------
+// DominanceBasedElimination Implementation
+//-----------------------------------------------------------------------------
+
+void DominanceBasedElimination::buildBlockSafetyCache(Function &F) {
+  // Reserve to reduce rehashing for a typical case.
+  BSC.DangerIdxInBB.reserve(F.size());
+  BSC.HasDangerInBB.reserve(F.size());
+  BSC.DangerIdxInBBPostDom.reserve(F.size());
+  BSC.HasDangerInBBPostDom.reserve(F.size());
+
+  for (BasicBlock &BB : F) {
+    SmallVector<unsigned, 4> Danger;
+    SmallVector<unsigned, 4> DangerForPostDom;
+    unsigned Idx = 0;
+    for (Instruction &I : BB) {
+      if (!isInstrSafe(&I))
+        Danger.push_back(Idx);
+      if (!isInstrSafeForPostDom(&I))
+        DangerForPostDom.push_back(Idx);
+      BSC.IndexInBB[&I] = Idx++;
+    }
+    BSC.HasDangerInBB[&BB] = !Danger.empty();
+    // Already in order by linear scan.
+    BSC.DangerIdxInBB[&BB] = std::move(Danger);
+
+    BSC.HasDangerInBBPostDom[&BB...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/169897