[clang] [compiler-rt] [llvm] [tsan] Introduce Adaptive Delay Scheduling to TSAN (PR #178836)

Sun Feb 15 20:18:31 PST 2026

https://github.com/ccotter updated https://github.com/llvm/llvm-project/pull/178836

>From b2c2c8b2545637ba807fdc910486b8fe98801dc5 Mon Sep 17 00:00:00 2001
From: Chris Cotter <ccotter14 at bloomberg.net>
Date: Mon, 16 Feb 2026 03:57:55 +0000
Subject: [PATCH] [tsan] Introduce Adaptive Delay to TSAN

This commit introduces an "adaptive delay" feature to the
ThreadSanitizer runtime to improve race detection by perturbing thread
schedules. At various synchronization points (atomic operations,
mutexes, and thread lifecycle events), the runtime may inject small
delays (spin loops, yields, or sleeps) to explore different thread
interleavings and expose data races that would otherwise occur only in
rare execution orders.

This change is inspired by prior work, which is discussed in more detail
on
https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969.
In short, https://reviews.llvm.org/D65383 was an earlier unmerged
attempt at adding a random delays. Feedback on the RFC led to the
version in this commit, aiming to limit the amount of delay.

The adaptive delay feature uses a configurable time budget and tiered
sampling strategy to balance race exposure against performance impact.
It prioritizes high-value synchronization points with clear
happens-before relationships: relaxed atomics receive lightweight spin
delays with low sampling, synchronizing atomics (acquire / release /
seq_cst) receive moderate delays with higher sampling, and mutex and
thread lifecycle operations receive the longest delays with highest
sampling.

The feature is disabled by default and incurs minimal overhead when not
enabled. Nearly all checks are guarded by an inline check on a global
variable that is only set when enable_adaptive_delay=1. Microbenchmarks
with tight loops of atomic operations showed no meaningful performance
difference between an unmodified TSAN runtime and this version when
running with empty TSAN_OPTIONS.

An LLM assisted in writing portions of the adaptive delay logic,
including the TimeBudget class, tiering concept, address sampler, and
per-thread quota system. I reviewed the output and made amendments to
reduce duplication and simplify the behavior. I also replaced the LLM's
original double-based calculation logic with the integer-based Percent
class. The LLM also helped write unit test cases for Percent.
---
 clang/docs/ThreadSanitizer.rst                | 120 +++++
 .../sanitizer_common/sanitizer_allocator.h    |   6 -
 .../lib/sanitizer_common/sanitizer_common.h   |   6 +
 compiler-rt/lib/tsan/rtl/CMakeLists.txt       |   1 +
 .../lib/tsan/rtl/tsan_adaptive_delay.cpp      | 431 ++++++++++++++++++
 .../lib/tsan/rtl/tsan_adaptive_delay.h        | 170 +++++++
 compiler-rt/lib/tsan/rtl/tsan_flags.inc       |  27 ++
 .../lib/tsan/rtl/tsan_interceptors_posix.cpp  |  22 +
 .../lib/tsan/rtl/tsan_interface_atomic.cpp    |  12 +
 compiler-rt/lib/tsan/rtl/tsan_rtl.cpp         |   5 +
 compiler-rt/lib/tsan/rtl/tsan_rtl.h           |   3 +
 .../lib/tsan/tests/unit/CMakeLists.txt        |   1 +
 .../lib/tsan/tests/unit/tsan_percent_test.cpp | 152 ++++++
 llvm/docs/ReleaseNotes.md                     |   2 +
 14 files changed, 952 insertions(+), 6 deletions(-)
 create mode 100644 compiler-rt/lib/tsan/rtl/tsan_adaptive_delay.cpp
 create mode 100644 compiler-rt/lib/tsan/rtl/tsan_adaptive_delay.h
 create mode 100644 compiler-rt/lib/tsan/tests/unit/tsan_percent_test.cpp

diff --git a/clang/docs/ThreadSanitizer.rst b/clang/docs/ThreadSanitizer.rst
index 86dc2600626b9..ecbfbb6f170fa 100644
--- a/clang/docs/ThreadSanitizer.rst
+++ b/clang/docs/ThreadSanitizer.rst
@@ -207,6 +207,126 @@ and can be run with ``make check-tsan`` command.
 We are actively working on enhancing the tool --- stay tuned.  Any help,
 especially in the form of minimized standalone tests is more than welcome.
 
+Adaptive Delay
+--------------
+
+Overview
+~~~~~~~~
+
+Adaptive Delay is an optional ThreadSanitizer feature that injects delays at
+synchronization points to explore novel thread interleavings and increase the
+likelihood of exposing data races. By perturbing thread scheduling, adaptive
+delay creates more opportunities for concurrent accesses to shared data,
+improving race detection.
+
+Adaptive delay is particularly useful for:
+
+* Detecting races in rarely-executed thread interleavings or code paths
+* Testing parallel data structures and algorithms
+
+When enabled, adaptive delay maintains a configurable time budget to balance
+race exposure against performance overhead. The delays can be
+
+ * random amount of spin cycles
+ * a single yield to the OS scheduler
+ * random usleep
+
+The strategy prioritizes high-value synchronization points:
+
+* Relaxed atomic operations receive cheap delays (spin cycles) with low sampling
+* Synchronizing atomic operations (acquire/release/seq_cst) receive moderate
+  delays with higher sampling
+* Mutex and thread lifecycle operations receive the longest delays with highest
+  sampling
+
+The delays focus on synchronization points with clear happens-before relationships,
+as those are most likely to expose data races.
+
+Enabling Adaptive Delay
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Adaptive delay is disabled by default. Enable it by setting the
+``enable_adaptive_delay`` flag:
+
+.. code-block:: console
+
+  $ TSAN_OPTIONS=enable_adaptive_delay=1 ./myapp
+
+Configuration Options
+~~~~~~~~~~~~~~~~~~~~~
+
+.. list-table:: Adaptive Delay Options
+   :name: adaptive-delay-options-table
+   :header-rows: 1
+   :widths: 35 10 15 40
+
+   * - Flag
+     - Type
+     - Default
+     - Description
+   * - ``enable_adaptive_delay``
+     - bool
+     - false
+     - Enable adaptive delay injection to expose data races.
+   * - ``adaptive_delay_aggressiveness``
+     - int
+     - 25
+     - Controls delay injection intensity for race detection. Higher values inject
+       more delays to expose races. Value must be greater than 0. Suggested values:
+       10 (minimal), 50 (moderate), 200 (aggressive). This is a tuning parameter;
+       actual overhead varies by workload and platform.
+   * - ``adaptive_delay_relaxed_sample_rate``
+     - int
+     - 10000
+     - Sample 1 in N relaxed atomic operations for delay injection. Relaxed atomics
+       have minimal synchronization, so sampling helps avoid excessive overhead.
+   * - ``adaptive_delay_sync_atomic_sample_rate``
+     - int
+     - 100
+     - Sample 1 in N acquire/release/seq_cst atomic operations for delay injection.
+       These synchronizing atomics are more likely to expose races, so are sampled
+       more often.
+   * - ``adaptive_delay_mutex_sample_rate``
+     - int
+     - 10
+     - Sample 1 in N mutex/condition variable operations for delay injection. Mutex
+       ops are high-value synchronization points and are sampled frequently.
+   * - ``adaptive_delay_max_atomic``
+     - string
+     - ``"sleep_us=50"``
+     - Maximum delay for atomic operations. Format: ``"spin=N"`` (N spin cycles,
+       1 <= N <= 10,000), ``"yield"`` (one yield to the OS), or ``"sleep_us=N"``
+       (up to N microseconds). The delay is randomly chosen up to the specified
+       maximum N.
+   * - ``adaptive_delay_max_sync``
+     - string
+     - ``"sleep_us=500"``
+     - Maximum delay for synchronization operations (mutex and thread lifecycle
+       operations). Format: same as ``adaptive_delay_max_atomic``. Typically set
+       longer than atomic delays since these operations involve waking blocked threads
+       and may be more likely to expose races.
+
+Examples
+~~~~~~~~
+
+Enable adaptive delay with moderate aggressiveness:
+
+.. code-block:: console
+
+  $ TSAN_OPTIONS=enable_adaptive_delay=1:adaptive_delay_aggressiveness=50 ./myapp
+
+Enable aggressive delay injection:
+
+.. code-block:: console
+
+  $ TSAN_OPTIONS=enable_adaptive_delay=1:adaptive_delay_aggressiveness=200 ./myapp
+
+Increase sampling frequency for mutex operations:
+
+.. code-block:: console
+
+  $ TSAN_OPTIONS=enable_adaptive_delay=1:adaptive_delay_mutex_sample_rate=5 ./myapp
+
 More Information
 ----------------
 `<https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual>`_
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_allocator.h b/compiler-rt/lib/sanitizer_common/sanitizer_allocator.h
index 0b28f86d14084..6154f7810334b 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_allocator.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_allocator.h
@@ -47,12 +47,6 @@ void PrintHintAllocatorCannotReturnNull();
 // Callback type for iterating over chunks.
 typedef void (*ForEachChunkCallback)(uptr chunk, void *arg);
 
-inline u32 Rand(u32 *state) {  // ANSI C linear congruential PRNG.
-  return (*state = *state * 1103515245 + 12345) >> 16;
-}
-
-inline u32 RandN(u32 *state, u32 n) { return Rand(state) % n; }  // [0, n)
-
 template<typename T>
 inline void RandomShuffle(T *a, u32 n, u32 *rand_state) {
   if (n <= 1) return;
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_common.h b/compiler-rt/lib/sanitizer_common/sanitizer_common.h
index 515a7c9cdf60f..564ae301475a9 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_common.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_common.h
@@ -1100,6 +1100,12 @@ inline u32 GetNumberOfCPUsCached() {
   return NumberOfCPUsCached;
 }
 
+inline u32 Rand(u32* state) {  // ANSI C linear congruential PRNG.
+  return (*state = *state * 1103515245 + 12345) >> 16;
+}
+
+inline u32 RandN(u32* state, u32 n) { return Rand(state) % n; }  // [0, n)
+
 }  // namespace __sanitizer
 
 inline void *operator new(__sanitizer::usize size,
diff --git a/compiler-rt/lib/tsan/rtl/CMakeLists.txt b/compiler-rt/lib/tsan/rtl/CMakeLists.txt
index d7d84706bfd58..6f093500c8f61 100644
--- a/compiler-rt/lib/tsan/rtl/CMakeLists.txt
+++ b/compiler-rt/lib/tsan/rtl/CMakeLists.txt
@@ -22,6 +22,7 @@ append_list_if(COMPILER_RT_HAS_LIBM m TSAN_DYNAMIC_LINK_LIBS)
 append_list_if(COMPILER_RT_HAS_LIBPTHREAD pthread TSAN_DYNAMIC_LINK_LIBS)
 
 set(TSAN_SOURCES
+  tsan_adaptive_delay.cpp
   tsan_debugging.cpp
   tsan_external.cpp
   tsan_fd.cpp
diff --git a/compiler-rt/lib/tsan/rtl/tsan_adaptive_delay.cpp b/compiler-rt/lib/tsan/rtl/tsan_adaptive_delay.cpp
new file mode 100644
index 0000000000000..f20d7a32db758
--- /dev/null
+++ b/compiler-rt/lib/tsan/rtl/tsan_adaptive_delay.cpp
@@ -0,0 +1,431 @@
+//===-- tsan_adaptive_delay.h -----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file is a part of ThreadSanitizer (TSan), a race detector.
+//
+//===----------------------------------------------------------------------===//
+
+#include "tsan_adaptive_delay.h"
+
+#include "interception/interception.h"
+#include "sanitizer_common/sanitizer_allocator_internal.h"
+#include "sanitizer_common/sanitizer_common.h"
+#include "sanitizer_common/sanitizer_errno_codes.h"
+#include "tsan_interface.h"
+#include "tsan_rtl.h"
+
+namespace __tsan {
+
+namespace {
+
+// =============================================================================
+// DelaySpec: Represents a delay configuration parsed from flag strings
+// =============================================================================
+//
+// Delay can be specified as:
+//   - "spin=N"     : Spin for up to N cycles (very short delays)
+//   - "yield"      : Call sched_yield() once
+//   - "sleep_us=N" : Sleep for up to N microseconds
+
+enum class DelayType { Spin, Yield, SleepUs };
+
+struct DelaySpec {
+  DelayType type;
+  int value;  // spin cycles or sleep_us value; ignored for yield
+
+  // Both estimates below are used internally as a very rough estimate for
+  // delay overhead calculation, to cap the overall delay to the
+  // adaptive_delay_aggressiveness option. They're not intended to be 100%
+  // accurate on any or all architectures/operating systems, or for use in any
+  // other contexts.
+  //
+  // Estimated nanoseconds per spin cycle (volatile loop iteration).
+  static constexpr u64 kNsPerSpinCycle = 1;
+  // Estimated nanoseconds for a yield (context switch overhead)
+  static constexpr u64 kNsPerYield = 500;
+
+  static DelaySpec Parse(const char* str) {
+    DelaySpec spec;
+    if (internal_strncmp(str, "spin=", 5) == 0) {
+      spec.type = DelayType::Spin;
+      spec.value = internal_atoll(str + 5);
+      if (spec.value <= 0 || spec.value > 10000) {
+        Printf(
+            "FATAL: Invalid TSAN_OPTIONS spin value '%s'; value must be "
+            "between 1 and 10000\n",
+            str);
+        Die();
+      }
+    } else if (internal_strcmp(str, "yield") == 0) {
+      spec.type = DelayType::Yield;
+      spec.value = 0;
+    } else if (internal_strncmp(str, "sleep_us=", 9) == 0) {
+      spec.type = DelayType::SleepUs;
+      spec.value = internal_atoll(str + 9);
+      if (spec.value <= 0) {
+        Printf(
+            "FATAL: Invalid TSAN_OPTIONS sleep_us value '%s'; value must be a "
+            "positive integer\n",
+            str);
+        Die();
+      }
+    } else {
+      Printf("FATAL: Unrecognized delay spec '%s', check TSAN_OPTIONS\n", str);
+      Die();
+    }
+    return spec;
+  }
+
+  const char* TypeName() const {
+    switch (type) {
+      case DelayType::Spin:
+        return "spin";
+      case DelayType::Yield:
+        return "yield";
+      case DelayType::SleepUs:
+        return "sleep_us";
+    }
+    return "unknown";
+  }
+};
+
+}  // namespace
+
+// =============================================================================
+// AdaptiveDelayImpl: Time-budget aware delay injection for race exposure
+// =============================================================================
+//
+// This implementation injects delays to expose data races while maintaining a
+// configurable overhead target. It uses several strategies:
+//
+// 1. Time-Budget Controller: Tracks cumulative delays vs wall-clock time
+//    and adjusts delay probability to maintain target overhead.
+//
+// 2. Tiered Delays: Different delay strategies for different op types:
+//    - Relaxed atomics: Very rare sampling, tiny spin delays
+//    - Sync atomics (acq/rel/seq_cst): Moderate sampling, small usleep
+//    - Mutex/CV ops: Higher sampling, larger delays
+//    - Thread create/join: Always delay (rare but high value)
+//
+// 3. Address-based Sampling: Exponential backoff per address to avoid
+//    repeatedly delaying hot atomics.
+
+struct AdaptiveDelayImpl {
+  ALWAYS_INLINE static AdaptiveDelayState* TLS() {
+    return &cur_thread()->adaptive_delay_state;
+  }
+  ALWAYS_INLINE static unsigned int* GetRandomSeed() {
+    return &TLS()->tls_random_seed_;
+  }
+  ALWAYS_INLINE static void SetRandomSeed(unsigned int seed) {
+    TLS()->tls_random_seed_ = seed;
+  }
+
+  // The public facing option is adaptive_delay_aggressiveness, which is an
+  // opaque value for the user to tune the amount of delay injected into the
+  // program. Internally, the implementation maps the aggressiveness to a target
+  // percent delay for the overall program runtime. It's not easy to implement
+  // a true wall clock delay target (e.g., 25% program wall time slowdown)
+  // because 1) spin loops and yield are hard to calculate actual wall time
+  // slowness and 2) usleep(N) is often slower than advertised. Thus, we keep
+  // the user facing parameter opaque to not under deliver on a promise of
+  // percent wall time slowdown.
+  struct TimeBudget {
+    int target_overhead_pct_;
+    Percent target_low_;
+    Percent target_high_;
+
+    void Init(int target_pct) {
+      target_overhead_pct_ = target_pct;
+      target_low_ = Percent::FromPct(
+          target_overhead_pct_ >= 5 ? target_overhead_pct_ - 5 : 0);
+      target_high_ = Percent::FromPct(target_overhead_pct_ + 5);
+    }
+
+    static constexpr u64 BucketDurationNs = 30'000'000'000ULL;
+
+    void RecordDelay(u64 delay_ns) {
+      u64 now = NanoTime();
+      u64 elapsed_ns = now - TLS()->bucket_start_ns_;
+
+      if (elapsed_ns >= BucketDurationNs) {
+        // Shift: old bucket is discarded, new becomes old, start fresh new
+        TLS()->delay_buckets_ns_[0] = TLS()->delay_buckets_ns_[1];
+        TLS()->delay_buckets_ns_[1] = 0;
+        TLS()->bucket_start_ns_ = now;
+        TLS()->bucket0_window_ns = BucketDurationNs;
+      }
+
+      TLS()->delay_buckets_ns_[1] += delay_ns;
+    }
+
+    Percent GetOverheadPercent() {
+      u64 now = NanoTime();
+      u64 elapsed_ns = now - TLS()->bucket_start_ns_;
+
+      // Need at least 1ms to calculate
+      if (elapsed_ns < 1'000'000ULL)
+        return Percent::FromPct(0);
+
+      if (elapsed_ns > BucketDurationNs * 2) {
+        // Both buckets are stale
+        return Percent::FromPct(0);
+      } else if (elapsed_ns > BucketDurationNs) {
+        // bucket[0] is stale, use only bucket[1] (current bucket)
+        u64 total_delay_ns = TLS()->delay_buckets_ns_[1];
+        return Percent::FromRatio(total_delay_ns, elapsed_ns);
+      } else {
+        u64 total_delay_ns =
+            TLS()->delay_buckets_ns_[0] + TLS()->delay_buckets_ns_[1];
+        u64 window_ns = TLS()->bucket0_window_ns + elapsed_ns;
+        return Percent::FromRatio(total_delay_ns, window_ns);
+      }
+    }
+
+    bool ShouldDelay() {
+      Percent ratio = GetOverheadPercent();
+
+      if (ratio < target_low_)
+        return true;
+      if (ratio > target_high_)
+        return false;
+
+      // Linear interpolation: at target_low -> 100%, at target_high -> 0%
+      Percent prob = (target_high_ - ratio) / (target_high_ - target_low_);
+      return prob.RandomCheck(GetRandomSeed());
+    }
+  };
+
+  // Address Sampler with Exponential Backoff
+  struct AddressSampler {
+    static constexpr u64 TABLE_SIZE = 2048;
+    struct Entry {
+      atomic_uintptr_t addr_;
+      atomic_uint32_t count_;
+    };
+    Entry table_[TABLE_SIZE];
+    static constexpr u32 ExponentialBackoffCap = 64;
+
+    void Init() {
+      for (u64 i = 0; i < TABLE_SIZE; ++i) {
+        atomic_store(&table_[i].addr_, 0, memory_order_relaxed);
+        atomic_store(&table_[i].count_, 0, memory_order_relaxed);
+      }
+    }
+
+    static ALWAYS_INLINE u64 splitmix64(u64 x) {
+      x = (x ^ (x >> 30)) * 0xBF58476D1CE4E5B9ULL;
+      x = (x ^ (x >> 27)) * 0x94D049BB133111EBULL;
+      x = x ^ (x >> 31);
+      return x;
+    }
+
+    // Uses exponential backoff: delay on 1st, 2nd, 4th, 8th, 16th, ...
+    bool ShouldDelayAddr(uptr addr) {
+      u64 idx = splitmix64(addr >> 3) & (TABLE_SIZE - 1);
+      Entry& e = table_[idx];
+
+      // This function is not thread safe.
+      // If two threads access the same hashed entry in parallel,
+      // worst case, we may end up returning true too often. This is
+      // acceptable...instead of full locking.
+
+      uptr stored_addr = atomic_load(&e.addr_, memory_order_relaxed);
+      if (stored_addr != addr) {
+        // Hash Collision - reset
+        atomic_store(&e.addr_, addr, memory_order_relaxed);
+        atomic_store(&e.count_, 1, memory_order_relaxed);
+        return true;
+      }
+
+      u32 count = atomic_fetch_add(&e.count_, 1, memory_order_relaxed) + 1;
+
+      if ((count & (count - 1)) == 0 && count <= ExponentialBackoffCap)
+        return true;
+      return false;
+    }
+  };
+
+  TimeBudget budget_;
+  AddressSampler sampler_;
+
+  int relaxed_sample_rate_;
+  int sync_atomic_sample_rate_;
+  int mutex_sample_rate_;
+  DelaySpec atomic_delay_;
+  DelaySpec sync_delay_;
+
+  void Init() {
+    InitTls();
+
+    AdaptiveDelay::is_adaptive_delay_enabled = flags()->enable_adaptive_delay;
+  }
+
+  void InitTls() {
+    TLS()->bucket_start_ns_ = NanoTime();
+    TLS()->delay_buckets_ns_[0] = 0;
+    TLS()->delay_buckets_ns_[1] = 0;
+    TLS()->bucket0_window_ns = 0;
+
+    SetRandomSeed(NanoTime());
+    TLS()->tls_initialized_ = true;
+  }
+
+  bool IsTlsInitialized() const { return TLS()->tls_initialized_; }
+
+  AdaptiveDelayImpl() {
+    relaxed_sample_rate_ = flags()->adaptive_delay_relaxed_sample_rate;
+    sync_atomic_sample_rate_ = flags()->adaptive_delay_sync_atomic_sample_rate;
+    mutex_sample_rate_ = flags()->adaptive_delay_mutex_sample_rate;
+    atomic_delay_ = DelaySpec::Parse(flags()->adaptive_delay_max_atomic);
+    sync_delay_ = DelaySpec::Parse(flags()->adaptive_delay_max_sync);
+
+    int delay_aggressiveness = flags()->adaptive_delay_aggressiveness;
+    if (delay_aggressiveness < 1)
+      delay_aggressiveness = 1;
+
+    budget_.Init(delay_aggressiveness);
+    sampler_.Init();
+
+    VPrintf(1, "INFO: ThreadSanitizer AdaptiveDelay initialized\n");
+    VPrintf(1, "  Delay aggressiveness: %d\n", delay_aggressiveness);
+    VPrintf(1, "  Relaxed atomic sample rate: 1/%d\n", relaxed_sample_rate_);
+    VPrintf(1, "  Sync atomic sample rate: 1/%d\n", sync_atomic_sample_rate_);
+    VPrintf(1, "  Mutex sample rate: 1/%d\n", mutex_sample_rate_);
+    VPrintf(1, "  Atomic delay: %s=%d\n", atomic_delay_.TypeName(),
+            atomic_delay_.value);
+    VPrintf(1, "  Sync delay: %s=%d\n", sync_delay_.TypeName(),
+            sync_delay_.value);
+  }
+
+  void DoSpinDelay(int iters) {
+    volatile int v = 0;
+    for (int i = 0; i < iters; ++i) v = i;
+    (void)v;
+    budget_.RecordDelay(iters * DelaySpec::kNsPerSpinCycle);
+  }
+
+  void DoYieldDelay() {
+    internal_sched_yield();
+    budget_.RecordDelay(DelaySpec::kNsPerYield);
+  }
+
+  void DoSleepUsDelay(int max_us) {
+    // Use two Rand() calls to get full 32-bit range for larger sleep values
+    u32 rnd = ((u32)Rand(GetRandomSeed()) << 16) | Rand(GetRandomSeed());
+    int delay_us = 1 + (rnd % max_us);
+    internal_usleep(delay_us);
+    budget_.RecordDelay(delay_us * 1000ULL);
+  }
+
+  void ExecuteDelay(const DelaySpec& spec) {
+    switch (spec.type) {
+      case DelayType::Spin: {
+        int iters = 1 + (Rand(GetRandomSeed()) % spec.value);
+        DoSpinDelay(iters);
+        break;
+      }
+      case DelayType::Yield:
+        DoYieldDelay();
+        break;
+      case DelayType::SleepUs:
+        DoSleepUsDelay(spec.value);
+        break;
+    }
+  }
+
+  void AtomicRelaxedOpDelay() {
+    if ((Rand(GetRandomSeed()) % relaxed_sample_rate_) != 0)
+      return;
+    if (!budget_.ShouldDelay())
+      return;
+
+    int iters = 10 + (Rand(GetRandomSeed()) % 10);
+    DoSpinDelay(iters);
+  }
+
+  void AtomicSyncOpDelay(uptr* addr) {
+    if ((Rand(GetRandomSeed()) % sync_atomic_sample_rate_) != 0)
+      return;
+    if (!budget_.ShouldDelay())
+      return;
+
+    if (addr && !sampler_.ShouldDelayAddr(*addr))
+      return;
+
+    ExecuteDelay(atomic_delay_);
+  }
+
+  void AtomicOpFence(int mo) {
+    CHECK(IsTlsInitialized());
+
+    if (mo < mo_acquire)
+      AtomicRelaxedOpDelay();
+    else
+      AtomicSyncOpDelay(nullptr);
+  }
+
+  void AtomicOpAddr(uptr addr, int mo) {
+    CHECK(IsTlsInitialized());
+
+    if (mo < mo_acquire)
+      AtomicRelaxedOpDelay();
+    else
+      AtomicSyncOpDelay(&addr);
+  }
+
+  void UnsampledDelay() {
+    CHECK(IsTlsInitialized());
+
+    if (!budget_.ShouldDelay())
+      return;
+
+    ExecuteDelay(sync_delay_);
+  }
+
+  void SyncOp() {
+    CHECK(IsTlsInitialized());
+
+    if ((Rand(GetRandomSeed()) % mutex_sample_rate_) != 0)
+      return;
+    if (!budget_.ShouldDelay())
+      return;
+
+    ExecuteDelay(sync_delay_);
+  }
+
+  void BeforeChildThreadRuns() {
+    InitTls();
+    UnsampledDelay();
+  }
+
+  void AfterThreadCreation() { UnsampledDelay(); }
+};
+
+AdaptiveDelayImpl& GetImpl() {
+  static AdaptiveDelayImpl impl;
+  return impl;
+}
+
+bool AdaptiveDelay::is_adaptive_delay_enabled;
+
+void AdaptiveDelay::InitImpl() { GetImpl().Init(); }
+
+void AdaptiveDelay::SyncOpImpl() { GetImpl().SyncOp(); }
+void AdaptiveDelay::AtomicOpFenceImpl(int mo) { GetImpl().AtomicOpFence(mo); }
+void AdaptiveDelay::AtomicOpAddrImpl(__sanitizer::uptr addr, int mo) {
+  GetImpl().AtomicOpAddr(addr, mo);
+}
+void AdaptiveDelay::AfterThreadCreationImpl() {
+  GetImpl().AfterThreadCreation();
+}
+void AdaptiveDelay::BeforeChildThreadRunsImpl() {
+  GetImpl().BeforeChildThreadRuns();
+}
+
+}  // namespace __tsan
diff --git a/compiler-rt/lib/tsan/rtl/tsan_adaptive_delay.h b/compiler-rt/lib/tsan/rtl/tsan_adaptive_delay.h
new file mode 100644
index 0000000000000..2c181792fe887
--- /dev/null
+++ b/compiler-rt/lib/tsan/rtl/tsan_adaptive_delay.h
@@ -0,0 +1,170 @@
+//===-- tsan_adaptive_delay.h -----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file is a part of ThreadSanitizer (TSan), a race detector.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef TSAN_ADAPTIVE_DELAY_H
+#define TSAN_ADAPTIVE_DELAY_H
+
+#include "sanitizer_common/sanitizer_common.h"
+#include "sanitizer_common/sanitizer_internal_defs.h"
+
+namespace __tsan {
+
+// AdaptiveDelay injects delays at synchronization points, atomic operations,
+// and thread lifecycle events to increase the likelihood of exposing data
+// races. The delay injection is controlled by an approximate time budget to
+// maintain a configurable overhead target.
+//
+// SyncOp() delays non-atomic synchronization points (those with clear
+// happens-before relationships):
+//  - Acquire operations like locking a mutex delays before the mutex is locked.
+//  - Release operations like unlocking a mutex delays after the mutex is
+//  unlocked
+// These are more likely to expose interesting (rare) thread interleavings.
+// For example, delaying a thread that unlocks a mutex from running to allow
+// newly woken thread to execute before the unlocking thread would normally
+// execute.
+struct AdaptiveDelay {
+  ALWAYS_INLINE static void Init() { InitImpl(); }
+
+  ALWAYS_INLINE static void SyncOp() {
+    if (!is_adaptive_delay_enabled)
+      return;
+    SyncOpImpl();
+  }
+
+  ALWAYS_INLINE static void AtomicOpFence(int mo) {
+    if (!is_adaptive_delay_enabled)
+      return;
+    AtomicOpFenceImpl(mo);
+  }
+
+  ALWAYS_INLINE static void AtomicOpAddr(__sanitizer::uptr addr, int mo) {
+    if (!is_adaptive_delay_enabled)
+      return;
+    AtomicOpAddrImpl(addr, mo);
+  }
+
+  ALWAYS_INLINE static void AfterThreadCreation() {
+    if (!is_adaptive_delay_enabled)
+      return;
+    AfterThreadCreationImpl();
+  }
+
+  ALWAYS_INLINE static void BeforeChildThreadRuns() {
+    if (!is_adaptive_delay_enabled)
+      return;
+    BeforeChildThreadRunsImpl();
+  }
+
+ private:
+  static void InitImpl();
+
+  static void SyncOpImpl();
+
+  static void AtomicOpFenceImpl(int mo);
+  static void AtomicOpAddrImpl(__sanitizer::uptr addr, int mo);
+
+  static void AfterThreadCreationImpl();
+  static void BeforeChildThreadRunsImpl();
+
+  static bool is_adaptive_delay_enabled;
+
+  friend struct AdaptiveDelayImpl;
+};
+
+// The runtime defines cur_thread() to retrieve TLS thread state, and it
+// takes care of platform specific implementation details. The AdaptiveDelay
+// implementation stores per-thread data in this struct, which is embedded
+// in cur_thread().
+struct AdaptiveDelayState {
+  // For the adaptive delay implementation
+  // Sliding window delay tracking: 2 buckets of 30 seconds each
+  u64 delay_buckets_ns_[2];  // [0] = older 30s, [1] = newer 30s
+  u64 bucket_start_ns_;      // When current bucket (index 1) started
+  u64 bucket0_window_ns;  // 0ns before the first bucket has rolled, and set to
+                          // the bucket window time after This handles the case
+                          // where, before the program has ran one bucket window
+                          // duration, we should not include the previous bucket
+                          // duration in the overhead percent calculation.
+  unsigned int tls_random_seed_;
+  bool tls_initialized_;
+};
+
+// Fixed-point arithmetic type that mimics floating point operations
+class Percent {
+  using u32 = __sanitizer::u32;
+  using u64 = __sanitizer::u64;
+
+  u32 bp_{};  // basis points (0-10000 represents 0.0-1.0)
+  bool is_valid_{};
+
+  static constexpr u32 kBasisPointsPerUnit = 10000;
+
+  Percent(u32 bp, bool is_valid) : bp_(bp), is_valid_(is_valid) {}
+
+ public:
+  Percent() = default;
+  Percent(const Percent&) = default;
+  Percent& operator=(const Percent&) = default;
+  Percent(Percent&&) = default;
+  Percent& operator=(Percent&&) = default;
+
+  static Percent FromPct(u32 pct) { return Percent{pct * 100, true}; }
+  static Percent FromRatio(u64 numerator, u64 denominator) {
+    if (denominator == 0)
+      return Percent{0, false};
+    // Avoid overflow: scale down if needed
+    if (numerator > UINT64_MAX / kBasisPointsPerUnit) {
+      return Percent{(u32)((numerator / denominator) * kBasisPointsPerUnit),
+                     true};
+    }
+    return Percent{(u32)((numerator * kBasisPointsPerUnit) / denominator),
+                   true};
+  }
+
+  bool IsValid() const { return is_valid_; }
+
+  // Returns true with probability equal to the percentage.
+  bool RandomCheck(u32* seed) const {
+    return (Rand(seed) % kBasisPointsPerUnit) < bp_;
+  }
+
+  int GetPct() const { return bp_ / 100; }
+  int GetBasisPoints() const { return bp_; }
+
+  bool operator==(const Percent& other) const { return bp_ == other.bp_; }
+  bool operator!=(const Percent& other) const { return bp_ != other.bp_; }
+  bool operator<(const Percent& other) const { return bp_ < other.bp_; }
+  bool operator>(const Percent& other) const { return bp_ > other.bp_; }
+  bool operator<=(const Percent& other) const { return bp_ <= other.bp_; }
+  bool operator>=(const Percent& other) const { return bp_ >= other.bp_; }
+
+  Percent operator-(const Percent& other) const {
+    if (!is_valid_ || !other.is_valid_)
+      return Percent{0, false};
+    if (bp_ < other.bp_)
+      return Percent{0, false};
+    return Percent{bp_ - other.bp_, true};
+  }
+
+  Percent operator/(const Percent& other) const {
+    if (!is_valid_ || !other.is_valid_)
+      return Percent{0, false};
+    if (other.bp_ == 0)
+      return Percent{0, false};
+    return Percent{(bp_ * kBasisPointsPerUnit) / other.bp_, true};
+  }
+};
+
+}  // namespace __tsan
+
+#endif  // TSAN_ADAPTIVE_DELAY_H
diff --git a/compiler-rt/lib/tsan/rtl/tsan_flags.inc b/compiler-rt/lib/tsan/rtl/tsan_flags.inc
index 77ab910f08fbc..68d4ba660debb 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_flags.inc
+++ b/compiler-rt/lib/tsan/rtl/tsan_flags.inc
@@ -92,3 +92,30 @@ TSAN_FLAG(LockDuringWriteSetting, lock_during_write, kLockDuringAllWrites,
           "\"disable_for_all_processes\" - don't lock during all writes in "
           "the current process and it's children processes.")
 #endif
+
+TSAN_FLAG(bool, enable_adaptive_delay, false,
+          "Enable adaptive delay injection to expose data races. When "
+          "enabled, delays are strategically injected at synchronization "
+          "points, atomic operations, and thread lifecycle events to increase "
+          "the likelihood of exposing races while maintaining a configurable "
+          "overhead budget.")
+
+TSAN_FLAG(
+    int, adaptive_delay_aggressiveness, 25,
+    "Controls delay injection intensity for race detection. Higher values "
+    "inject more delays to expose races. Suggested values: 10 (minimal delay), "
+    "50 (moderate delay), 200 (aggressive). "
+    "This is a tuning parameter; actual overhead varies by workload and "
+    "platform.")
+TSAN_FLAG(int, adaptive_delay_relaxed_sample_rate, 10000,
+          "Sample 1 in N relaxed atomic operations for delay")
+TSAN_FLAG(int, adaptive_delay_sync_atomic_sample_rate, 100,
+          "Sample 1 in N acquire/release/seq_cst atomic operations for delay")
+TSAN_FLAG(int, adaptive_delay_mutex_sample_rate, 10,
+          "Sample 1 in N mutex/cv operations for delay")
+TSAN_FLAG(const char*, adaptive_delay_max_atomic, "sleep_us=50",
+          "Delay for atomic operations: 'spin=N' (max N spins), 'yield', or "
+          "'sleep_us=N' (max N>0 us sleep)")
+TSAN_FLAG(const char*, adaptive_delay_max_sync, "sleep_us=500",
+          "Delay for sync operations: 'spin=N' (max N spins), 'yield', or "
+          "'sleep_us=N' (max N>0 us sleep)")
diff --git a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
index 714220a0109a8..6ae871af11ea9 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
@@ -34,6 +34,7 @@
 #if SANITIZER_APPLE && !SANITIZER_GO
 #  include "tsan_flags.h"
 #endif
+#include "tsan_adaptive_delay.h"
 #include "tsan_interceptors.h"
 #include "tsan_interface.h"
 #include "tsan_mman.h"
@@ -1065,6 +1066,9 @@ extern "C" void *__tsan_thread_start_func(void *arg) {
     ThreadStart(thr, p->tid, GetTid(), ThreadType::Regular);
     p->started.Post();
   }
+
+  AdaptiveDelay::BeforeChildThreadRuns();
+
   void *res = callback(param);
   // Prevent the callback from being tail called,
   // it mixes up stack traces.
@@ -1128,6 +1132,7 @@ TSAN_INTERCEPTOR(int, pthread_create,
   }
   if (attr == &myattr)
     pthread_attr_destroy(&myattr);
+  AdaptiveDelay::AfterThreadCreation();
   return res;
 }
 
@@ -1423,6 +1428,7 @@ TSAN_INTERCEPTOR(int, pthread_mutex_destroy, void *m) {
 TSAN_INTERCEPTOR(int, pthread_mutex_lock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_mutex_lock, m);
   MutexPreLock(thr, pc, (uptr)m);
+  AdaptiveDelay::SyncOp();
   int res = BLOCK_REAL(pthread_mutex_lock)(m);
   if (res == errno_EOWNERDEAD)
     MutexRepair(thr, pc, (uptr)m);
@@ -1435,6 +1441,7 @@ TSAN_INTERCEPTOR(int, pthread_mutex_lock, void *m) {
 
 TSAN_INTERCEPTOR(int, pthread_mutex_trylock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_mutex_trylock, m);
+  AdaptiveDelay::SyncOp();
   int res = REAL(pthread_mutex_trylock)(m);
   if (res == errno_EOWNERDEAD)
     MutexRepair(thr, pc, (uptr)m);
@@ -1446,6 +1453,7 @@ TSAN_INTERCEPTOR(int, pthread_mutex_trylock, void *m) {
 #if !SANITIZER_APPLE
 TSAN_INTERCEPTOR(int, pthread_mutex_timedlock, void *m, void *abstime) {
   SCOPED_TSAN_INTERCEPTOR(pthread_mutex_timedlock, m, abstime);
+  AdaptiveDelay::SyncOp();
   int res = REAL(pthread_mutex_timedlock)(m, abstime);
   if (res == 0) {
     MutexPostLock(thr, pc, (uptr)m, MutexFlagTryLock);
@@ -1458,6 +1466,7 @@ TSAN_INTERCEPTOR(int, pthread_mutex_unlock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_mutex_unlock, m);
   MutexUnlock(thr, pc, (uptr)m);
   int res = REAL(pthread_mutex_unlock)(m);
+  AdaptiveDelay::SyncOp();
   if (res == errno_EINVAL)
     MutexInvalidAccess(thr, pc, (uptr)m);
   return res;
@@ -1468,6 +1477,7 @@ TSAN_INTERCEPTOR(int, pthread_mutex_clocklock, void *m,
                  __sanitizer_clockid_t clock, void *abstime) {
   SCOPED_TSAN_INTERCEPTOR(pthread_mutex_clocklock, m, clock, abstime);
   MutexPreLock(thr, pc, (uptr)m);
+  AdaptiveDelay::SyncOp();
   int res = BLOCK_REAL(pthread_mutex_clocklock)(m, clock, abstime);
   if (res == errno_EOWNERDEAD)
     MutexRepair(thr, pc, (uptr)m);
@@ -1486,6 +1496,7 @@ TSAN_INTERCEPTOR(int, pthread_mutex_clocklock, void *m,
 TSAN_INTERCEPTOR(int, __pthread_mutex_lock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(__pthread_mutex_lock, m);
   MutexPreLock(thr, pc, (uptr)m);
+  AdaptiveDelay::SyncOp();
   int res = BLOCK_REAL(__pthread_mutex_lock)(m);
   if (res == errno_EOWNERDEAD)
     MutexRepair(thr, pc, (uptr)m);
@@ -1500,6 +1511,7 @@ TSAN_INTERCEPTOR(int, __pthread_mutex_unlock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(__pthread_mutex_unlock, m);
   MutexUnlock(thr, pc, (uptr)m);
   int res = REAL(__pthread_mutex_unlock)(m);
+  AdaptiveDelay::SyncOp();
   if (res == errno_EINVAL)
     MutexInvalidAccess(thr, pc, (uptr)m);
   return res;
@@ -1529,6 +1541,7 @@ TSAN_INTERCEPTOR(int, pthread_spin_destroy, void *m) {
 TSAN_INTERCEPTOR(int, pthread_spin_lock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_spin_lock, m);
   MutexPreLock(thr, pc, (uptr)m);
+  AdaptiveDelay::SyncOp();
   int res = BLOCK_REAL(pthread_spin_lock)(m);
   if (res == 0) {
     MutexPostLock(thr, pc, (uptr)m);
@@ -1538,6 +1551,7 @@ TSAN_INTERCEPTOR(int, pthread_spin_lock, void *m) {
 
 TSAN_INTERCEPTOR(int, pthread_spin_trylock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_spin_trylock, m);
+  AdaptiveDelay::SyncOp();
   int res = REAL(pthread_spin_trylock)(m);
   if (res == 0) {
     MutexPostLock(thr, pc, (uptr)m, MutexFlagTryLock);
@@ -1549,6 +1563,7 @@ TSAN_INTERCEPTOR(int, pthread_spin_unlock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_spin_unlock, m);
   MutexUnlock(thr, pc, (uptr)m);
   int res = REAL(pthread_spin_unlock)(m);
+  AdaptiveDelay::SyncOp();
   return res;
 }
 #endif
@@ -1574,6 +1589,7 @@ TSAN_INTERCEPTOR(int, pthread_rwlock_destroy, void *m) {
 TSAN_INTERCEPTOR(int, pthread_rwlock_rdlock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_rwlock_rdlock, m);
   MutexPreReadLock(thr, pc, (uptr)m);
+  AdaptiveDelay::SyncOp();
   int res = REAL(pthread_rwlock_rdlock)(m);
   if (res == 0) {
     MutexPostReadLock(thr, pc, (uptr)m);
@@ -1583,6 +1599,7 @@ TSAN_INTERCEPTOR(int, pthread_rwlock_rdlock, void *m) {
 
 TSAN_INTERCEPTOR(int, pthread_rwlock_tryrdlock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_rwlock_tryrdlock, m);
+  AdaptiveDelay::SyncOp();
   int res = REAL(pthread_rwlock_tryrdlock)(m);
   if (res == 0) {
     MutexPostReadLock(thr, pc, (uptr)m, MutexFlagTryLock);
@@ -1593,6 +1610,7 @@ TSAN_INTERCEPTOR(int, pthread_rwlock_tryrdlock, void *m) {
 #if !SANITIZER_APPLE
 TSAN_INTERCEPTOR(int, pthread_rwlock_timedrdlock, void *m, void *abstime) {
   SCOPED_TSAN_INTERCEPTOR(pthread_rwlock_timedrdlock, m, abstime);
+  AdaptiveDelay::SyncOp();
   int res = REAL(pthread_rwlock_timedrdlock)(m, abstime);
   if (res == 0) {
     MutexPostReadLock(thr, pc, (uptr)m);
@@ -1604,6 +1622,7 @@ TSAN_INTERCEPTOR(int, pthread_rwlock_timedrdlock, void *m, void *abstime) {
 TSAN_INTERCEPTOR(int, pthread_rwlock_wrlock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_rwlock_wrlock, m);
   MutexPreLock(thr, pc, (uptr)m);
+  AdaptiveDelay::SyncOp();
   int res = BLOCK_REAL(pthread_rwlock_wrlock)(m);
   if (res == 0) {
     MutexPostLock(thr, pc, (uptr)m);
@@ -1613,6 +1632,7 @@ TSAN_INTERCEPTOR(int, pthread_rwlock_wrlock, void *m) {
 
 TSAN_INTERCEPTOR(int, pthread_rwlock_trywrlock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_rwlock_trywrlock, m);
+  AdaptiveDelay::SyncOp();
   int res = REAL(pthread_rwlock_trywrlock)(m);
   if (res == 0) {
     MutexPostLock(thr, pc, (uptr)m, MutexFlagTryLock);
@@ -1623,6 +1643,7 @@ TSAN_INTERCEPTOR(int, pthread_rwlock_trywrlock, void *m) {
 #if !SANITIZER_APPLE
 TSAN_INTERCEPTOR(int, pthread_rwlock_timedwrlock, void *m, void *abstime) {
   SCOPED_TSAN_INTERCEPTOR(pthread_rwlock_timedwrlock, m, abstime);
+  AdaptiveDelay::SyncOp();
   int res = REAL(pthread_rwlock_timedwrlock)(m, abstime);
   if (res == 0) {
     MutexPostLock(thr, pc, (uptr)m, MutexFlagTryLock);
@@ -1635,6 +1656,7 @@ TSAN_INTERCEPTOR(int, pthread_rwlock_unlock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_rwlock_unlock, m);
   MutexReadOrWriteUnlock(thr, pc, (uptr)m);
   int res = REAL(pthread_rwlock_unlock)(m);
+  AdaptiveDelay::SyncOp();
   return res;
 }
 
diff --git a/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp b/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
index 527e5a9b4a8d8..5c2461634d2d4 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
@@ -21,6 +21,7 @@
 #include "sanitizer_common/sanitizer_mutex.h"
 #include "sanitizer_common/sanitizer_placement_new.h"
 #include "sanitizer_common/sanitizer_stacktrace.h"
+#include "tsan_adaptive_delay.h"
 #include "tsan_flags.h"
 #include "tsan_interface.h"
 #include "tsan_rtl.h"
@@ -520,8 +521,19 @@ static morder to_morder(int mo) {
   return res;
 }
 
+template <class... Types>
+ALWAYS_INLINE auto AtomicDelayImpl(morder mo, Types... args) {
+  AdaptiveDelay::AtomicOpFence(mo);
+}
+
+template <class AddrType, class... Types>
+ALWAYS_INLINE auto AtomicDelayImpl(morder mo, AddrType addr, Types... args) {
+  AdaptiveDelay::AtomicOpAddr((uptr)addr, (int)mo);
+}
+
 template <class Op, class... Types>
 ALWAYS_INLINE auto AtomicImpl(morder mo, Types... args) {
+  AtomicDelayImpl(mo, args...);
   ThreadState *const thr = cur_thread();
   ProcessPendingSignals(thr);
   if (UNLIKELY(thr->ignore_sync || thr->ignore_interceptors))
diff --git a/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp b/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
index feee566f44829..a9147f09e34cb 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
@@ -21,6 +21,7 @@
 #include "sanitizer_common/sanitizer_placement_new.h"
 #include "sanitizer_common/sanitizer_stackdepot.h"
 #include "sanitizer_common/sanitizer_symbolizer.h"
+#include "tsan_adaptive_delay.h"
 #include "tsan_defs.h"
 #include "tsan_interface.h"
 #include "tsan_mman.h"
@@ -775,6 +776,10 @@ void Initialize(ThreadState *thr) {
     while (__tsan_resumed == 0) {}
   }
 
+#if !SANITIZER_GO
+  AdaptiveDelay::Init();
+#endif
+
   OnInitialize();
 }
 
diff --git a/compiler-rt/lib/tsan/rtl/tsan_rtl.h b/compiler-rt/lib/tsan/rtl/tsan_rtl.h
index 635654616b781..3d1018accafc4 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_rtl.h
+++ b/compiler-rt/lib/tsan/rtl/tsan_rtl.h
@@ -34,6 +34,7 @@
 #include "sanitizer_common/sanitizer_suppressions.h"
 #include "sanitizer_common/sanitizer_thread_registry.h"
 #include "sanitizer_common/sanitizer_vector.h"
+#include "tsan_adaptive_delay.h"
 #include "tsan_defs.h"
 #include "tsan_flags.h"
 #include "tsan_ignoreset.h"
@@ -240,6 +241,8 @@ struct alignas(SANITIZER_CACHE_LINE_SIZE) ThreadState {
   bool in_internal_write_call;
 #endif
 
+  AdaptiveDelayState adaptive_delay_state;
+
   explicit ThreadState(Tid tid);
 };
 
diff --git a/compiler-rt/lib/tsan/tests/unit/CMakeLists.txt b/compiler-rt/lib/tsan/tests/unit/CMakeLists.txt
index 005457e374c40..d183c3071cff8 100644
--- a/compiler-rt/lib/tsan/tests/unit/CMakeLists.txt
+++ b/compiler-rt/lib/tsan/tests/unit/CMakeLists.txt
@@ -3,6 +3,7 @@ set(TSAN_UNIT_TEST_SOURCES
   tsan_flags_test.cpp
   tsan_ilist_test.cpp
   tsan_mman_test.cpp
+  tsan_percent_test.cpp
   tsan_shadow_test.cpp
   tsan_stack_test.cpp
   tsan_sync_test.cpp
diff --git a/compiler-rt/lib/tsan/tests/unit/tsan_percent_test.cpp b/compiler-rt/lib/tsan/tests/unit/tsan_percent_test.cpp
new file mode 100644
index 0000000000000..6a5367ceb1042
--- /dev/null
+++ b/compiler-rt/lib/tsan/tests/unit/tsan_percent_test.cpp
@@ -0,0 +1,152 @@
+//===-- tsan_percent_test.cpp ---------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file is a part of ThreadSanitizer (TSan), a race detector.
+//
+//===----------------------------------------------------------------------===//
+#include "gtest/gtest.h"
+#include "tsan_adaptive_delay.h"
+
+namespace __tsan {
+
+TEST(Percent, DefaultedObject) {
+  Percent defaulted;
+  EXPECT_FALSE(defaulted.IsValid());
+}
+
+TEST(Percent, FromPct) {
+  Percent p0 = Percent::FromPct(0);
+  Percent p50 = Percent::FromPct(50);
+  Percent p100 = Percent::FromPct(100);
+  Percent p150 = Percent::FromPct(150);
+
+  EXPECT_TRUE(p0.IsValid());
+  EXPECT_TRUE(p50.IsValid());
+  EXPECT_TRUE(p100.IsValid());
+  EXPECT_TRUE(p150.IsValid());
+
+  EXPECT_EQ(p0, p0);
+  EXPECT_NE(p0, p50);
+  EXPECT_NE(p50, p100);
+
+  EXPECT_EQ(p0.GetBasisPoints(), 0);
+  EXPECT_EQ(p50.GetBasisPoints(), 5000);
+  EXPECT_EQ(p100.GetBasisPoints(), 10000);
+  EXPECT_EQ(p150.GetBasisPoints(), 15000);
+
+  EXPECT_EQ(p0.GetPct(), 0);
+  EXPECT_EQ(p50.GetPct(), 50);
+  EXPECT_EQ(p100.GetPct(), 100);
+  EXPECT_EQ(p150.GetPct(), 150);
+}
+
+TEST(Percent, FromRatio) {
+  Percent half = Percent::FromRatio(1, 2);
+  Percent expected_half = Percent::FromPct(50);
+  EXPECT_TRUE(half.IsValid());
+  EXPECT_EQ(half, expected_half);
+
+  Percent quarter = Percent::FromRatio(1, 4);
+  Percent expected_quarter = Percent::FromPct(25);
+  EXPECT_EQ(quarter, expected_quarter);
+
+  Percent full = Percent::FromRatio(100, 100);
+  Percent expected_full = Percent::FromPct(100);
+  EXPECT_EQ(full, expected_full);
+
+  Percent div_zero = Percent::FromRatio(50, 0);
+  EXPECT_FALSE(div_zero.IsValid());
+}
+
+TEST(Percent, Comparisons) {
+  Percent low = Percent::FromPct(20);
+  Percent p20 = Percent::FromPct(20);
+  Percent mid = Percent::FromPct(50);
+  Percent high = Percent::FromPct(80);
+
+  EXPECT_TRUE(low == p20);
+  EXPECT_FALSE(low != p20);
+  EXPECT_FALSE(low == mid);
+  EXPECT_TRUE(low != mid);
+
+  EXPECT_TRUE(low < mid);
+  EXPECT_TRUE(mid < high);
+  EXPECT_FALSE(high < low);
+
+  EXPECT_TRUE(high > mid);
+  EXPECT_TRUE(mid > low);
+  EXPECT_FALSE(low > high);
+
+  EXPECT_TRUE(low <= mid);
+  EXPECT_TRUE(low <= low);
+
+  EXPECT_TRUE(high >= mid);
+  EXPECT_TRUE(high >= high);
+}
+
+TEST(Percent, Subtraction) {
+  Percent a = Percent::FromPct(75);
+  Percent b = Percent::FromPct(25);
+  Percent result = a - b;
+
+  Percent expected = Percent::FromPct(50);
+  EXPECT_TRUE(result.IsValid());
+  EXPECT_EQ(result, expected);
+
+  // Underflow
+  Percent low = Percent::FromPct(20);
+  Percent high = Percent::FromPct(80);
+  Percent underflow = low - high;
+  EXPECT_FALSE(underflow.IsValid());
+
+  Percent result_invalid = underflow - low;
+  EXPECT_FALSE(result_invalid.IsValid());
+}
+
+TEST(Percent, Division) {
+  Percent numerator = Percent::FromPct(100);
+  Percent denominator = Percent::FromPct(50);
+  Percent result = numerator / denominator;
+
+  Percent expected = Percent::FromPct(200);
+  EXPECT_TRUE(result.IsValid());
+  EXPECT_EQ(result, expected);
+
+  Percent zero = Percent::FromPct(0);
+  Percent non_zero = Percent::FromPct(50);
+  Percent div_zero_result = non_zero / zero;
+  EXPECT_FALSE(div_zero_result.IsValid());
+
+  Percent invalid = Percent::FromRatio(10, 0);
+  Percent valid = Percent::FromPct(50);
+  Percent result_invalid = valid / invalid;
+  EXPECT_FALSE(result_invalid.IsValid());
+}
+
+TEST(Percent, RandomCheck) {
+  unsigned int seed = 0;
+
+  Percent p0 = Percent::FromPct(0);
+  for (int i = 0; i < 100; ++i) {
+    EXPECT_FALSE(p0.RandomCheck(&seed));
+  }
+
+  Percent p50 = Percent::FromPct(50);
+  for (int i = 0; i < 100; ++i) {
+    p50.RandomCheck(&seed);
+    // No verification since we cannot guarantee the random result.
+    // Just verify the code does not crash...
+  }
+
+  Percent p150 = Percent::FromPct(150);
+  for (int i = 0; i < 100; ++i) {
+    EXPECT_TRUE(p150.RandomCheck(&seed));
+  }
+}
+
+}  // namespace __tsan
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index dffdb4291f87a..128c19296e75e 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -208,6 +208,8 @@ Changes to BOLT
 Changes to Sanitizers
 ---------------------
 
+* Add a random delay into ThreadSanitizer to help find rare thread interleavings.
+
 Other Changes
 -------------