[compiler-rt] 6a4c395 - [scudo] Add the thread-safety annotations

Tue Feb 14 17:20:51 PST 2023

Author: Chia-hung Duan
Date: 2023-02-15T01:19:51Z
New Revision: 6a4c39596d688955f203351bb28f72c7bc1e93d6

URL: https://github.com/llvm/llvm-project/commit/6a4c39596d688955f203351bb28f72c7bc1e93d6
DIFF: https://github.com/llvm/llvm-project/commit/6a4c39596d688955f203351bb28f72c7bc1e93d6.diff

LOG: [scudo] Add the thread-safety annotations

This CL adds the proper thread-safety annotations for most of the
functions and variables. However, given the restriction of the current
architecture, in some cases, we may not be able to use the annotations
easily. The followings are two exceptions,

1. enable()/disable(): Many structures in scudo are enabled/disabled by
   acquiring the lock in each instance. This makes those structure act
   like a `lock`. We can't mark those functions with ACQUIRE()/RELEASE()
   because that makes the entire allocator become another `lock`. In the
   end, that implies we need to *acquire* the `allocator` before each
   malloc et al. request. Therefore, adding a variable to tell the
   status of those structures may be a better way to cooperate with
   thread-safety annotation.

2. TSD/TSD shared/TSD exclusive: These three have simiar restrictions as
   mentioned above. In addition, they don't always need to be released
   if it's a thread local instance. However, thread-safety analysis
   doesn't support conditional branch. Which means we can't mark the
   proper annotations around the uses of TSDs. We may consider to make
   it consistent and which makes the code structure simpler.

This CL is supposed to introduce the annotations with the least code
refactoring. So only trivial thread safety issues will be addressed
here. For example, lacking of acquiring certain lock before accessing
certain variables will have the ScopedLock inserted. Other than that,
they are supposed to be done in the later changes.

Reviewed By: cferris

Differential Revision: https://reviews.llvm.org/D140706

Added: 
    compiler-rt/lib/scudo/standalone/thread_annotations.h

Modified: 
    compiler-rt/lib/scudo/standalone/CMakeLists.txt
    compiler-rt/lib/scudo/standalone/combined.h
    compiler-rt/lib/scudo/standalone/fuchsia.cpp
    compiler-rt/lib/scudo/standalone/linux.cpp
    compiler-rt/lib/scudo/standalone/mutex.h
    compiler-rt/lib/scudo/standalone/primary32.h
    compiler-rt/lib/scudo/standalone/primary64.h
    compiler-rt/lib/scudo/standalone/quarantine.h
    compiler-rt/lib/scudo/standalone/release.h
    compiler-rt/lib/scudo/standalone/secondary.h
    compiler-rt/lib/scudo/standalone/stats.h
    compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt
    compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
    compiler-rt/lib/scudo/standalone/tests/mutex_test.cpp
    compiler-rt/lib/scudo/standalone/tests/tsd_test.cpp
    compiler-rt/lib/scudo/standalone/trusty.cpp
    compiler-rt/lib/scudo/standalone/tsd.h
    compiler-rt/lib/scudo/standalone/tsd_exclusive.h
    compiler-rt/lib/scudo/standalone/tsd_shared.h

Removed: 
    


################################################################################
diff  --git a/compiler-rt/lib/scudo/standalone/CMakeLists.txt b/compiler-rt/lib/scudo/standalone/CMakeLists.txt
index 9ac3340877da1..eefcffd4cfc56 100644

--- a/compiler-rt/lib/scudo/standalone/CMakeLists.txt
+++ b/compiler-rt/lib/scudo/standalone/CMakeLists.txt
@@ -30,6 +30,9 @@ else()
   list(APPEND SCUDO_CFLAGS -O3)
 endif()
 
+append_list_if(COMPILER_RT_HAS_WTHREAD_SAFETY_FLAG -Werror=thread-safety
+  SCUDO_CFLAGS)
+
 set(SCUDO_LINK_FLAGS)
 
 list(APPEND SCUDO_LINK_FLAGS -Wl,-z,defs,-z,now,-z,relro)

diff  --git a/compiler-rt/lib/scudo/standalone/combined.h b/compiler-rt/lib/scudo/standalone/combined.h
index f41254bef3268..7812f277a0321 100644
--- a/compiler-rt/lib/scudo/standalone/combined.h
+++ b/compiler-rt/lib/scudo/standalone/combined.h
@@ -305,7 +305,7 @@ class Allocator {
 
   NOINLINE void *allocate(uptr Size, Chunk::Origin Origin,
                           uptr Alignment = MinAlignment,
-                          bool ZeroContents = false) {
+                          bool ZeroContents = false) NO_THREAD_SAFETY_ANALYSIS {
     initThreadMaybe();
 
     const Options Options = Primary.Options.load();
@@ -389,9 +389,10 @@ class Allocator {
       if (UnlockRequired)
         TSD->unlock();
     }
-    if (UNLIKELY(ClassId == 0))
+    if (UNLIKELY(ClassId == 0)) {
       Block = Secondary.allocate(Options, Size, Alignment, &SecondaryBlockEnd,
                                  FillContents);
+    }
 
     if (UNLIKELY(!Block)) {
       if (Options.get(OptionBit::MayReturnNull))
@@ -697,7 +698,7 @@ class Allocator {
   // TODO(kostyak): disable() is currently best-effort. There are some small
   //                windows of time when an allocation could still succeed after
   //                this function finishes. We will revisit that later.
-  void disable() {
+  void disable() NO_THREAD_SAFETY_ANALYSIS {
     initThreadMaybe();
 #ifdef GWP_ASAN_HOOKS
     GuardedAlloc.disable();
@@ -709,7 +710,7 @@ class Allocator {
     Secondary.disable();
   }
 
-  void enable() {
+  void enable() NO_THREAD_SAFETY_ANALYSIS {
     initThreadMaybe();
     Secondary.enable();
     Primary.enable();
@@ -1124,7 +1125,8 @@ class Allocator {
   }
 
   void quarantineOrDeallocateChunk(Options Options, void *TaggedPtr,
-                                   Chunk::UnpackedHeader *Header, uptr Size) {
+                                   Chunk::UnpackedHeader *Header,
+                                   uptr Size) NO_THREAD_SAFETY_ANALYSIS {
     void *Ptr = getHeaderTaggedPointer(TaggedPtr);
     Chunk::UnpackedHeader NewHeader = *Header;
     // If the quarantine is disabled, the actual size of a chunk is 0 or larger

diff  --git a/compiler-rt/lib/scudo/standalone/fuchsia.cpp b/compiler-rt/lib/scudo/standalone/fuchsia.cpp
index 70e4e714f2cb9..da684e7f1de0c 100644
--- a/compiler-rt/lib/scudo/standalone/fuchsia.cpp
+++ b/compiler-rt/lib/scudo/standalone/fuchsia.cpp
@@ -195,6 +195,8 @@ void HybridMutex::unlock() __TA_NO_THREAD_SAFETY_ANALYSIS {
   sync_mutex_unlock(&M);
 }
 
+void HybridMutex::assertHeldImpl() __TA_NO_THREAD_SAFETY_ANALYSIS {}
+
 u64 getMonotonicTime() { return _zx_clock_get_monotonic(); }
 
 u32 getNumberOfCPUs() { return _zx_system_get_num_cpus(); }

diff  --git a/compiler-rt/lib/scudo/standalone/linux.cpp b/compiler-rt/lib/scudo/standalone/linux.cpp
index 9c5755af5750a..33757e292f25b 100644
--- a/compiler-rt/lib/scudo/standalone/linux.cpp
+++ b/compiler-rt/lib/scudo/standalone/linux.cpp
@@ -11,6 +11,7 @@
 #if SCUDO_LINUX
 
 #include "common.h"
+#include "internal_defs.h"
 #include "linux.h"
 #include "mutex.h"
 #include "string_utils.h"
@@ -128,6 +129,10 @@ void HybridMutex::unlock() {
   }
 }
 
+void HybridMutex::assertHeldImpl() {
+  CHECK(atomic_load(&M, memory_order_acquire) != Unlocked);
+}
+
 u64 getMonotonicTime() {
   timespec TS;
   clock_gettime(CLOCK_MONOTONIC, &TS);

diff  --git a/compiler-rt/lib/scudo/standalone/mutex.h b/compiler-rt/lib/scudo/standalone/mutex.h
index c8504c0409144..05340de3e12d7 100644
--- a/compiler-rt/lib/scudo/standalone/mutex.h
+++ b/compiler-rt/lib/scudo/standalone/mutex.h
@@ -11,6 +11,7 @@
 
 #include "atomic_helpers.h"
 #include "common.h"
+#include "thread_annotations.h"
 
 #include <string.h>
 
@@ -20,10 +21,10 @@
 
 namespace scudo {
 
-class HybridMutex {
+class CAPABILITY("mutex") HybridMutex {
 public:
-  bool tryLock();
-  NOINLINE void lock() {
+  bool tryLock() TRY_ACQUIRE(true);
+  NOINLINE void lock() ACQUIRE() {
     if (LIKELY(tryLock()))
       return;
       // The compiler may try to fully unroll the loop, ending up in a
@@ -40,9 +41,20 @@ class HybridMutex {
     }
     lockSlow();
   }
-  void unlock();
+  void unlock() RELEASE();
+
+  // TODO(chiahungduan): In general, we may want to assert the owner of lock as
+  // well. Given the current uses of HybridMutex, it's acceptable without
+  // asserting the owner. Re-evaluate this when we have certain scenarios which
+  // requires a more fine-grained lock granularity.
+  ALWAYS_INLINE void assertHeld() ASSERT_CAPABILITY(this) {
+    if (SCUDO_DEBUG)
+      assertHeldImpl();
+  }
 
 private:
+  void assertHeldImpl();
+
   static constexpr u8 NumberOfTries = 8U;
   static constexpr u8 NumberOfYields = 8U;
 
@@ -52,13 +64,13 @@ class HybridMutex {
   sync_mutex_t M = {};
 #endif
 
-  void lockSlow();
+  void lockSlow() ACQUIRE();
 };
 
-class ScopedLock {
+class SCOPED_CAPABILITY ScopedLock {
 public:
-  explicit ScopedLock(HybridMutex &M) : Mutex(M) { Mutex.lock(); }
-  ~ScopedLock() { Mutex.unlock(); }
+  explicit ScopedLock(HybridMutex &M) ACQUIRE(M) : Mutex(M) { Mutex.lock(); }
+  ~ScopedLock() RELEASE() { Mutex.unlock(); }
 
 private:
   HybridMutex &Mutex;

diff  --git a/compiler-rt/lib/scudo/standalone/primary32.h b/compiler-rt/lib/scudo/standalone/primary32.h
index c9e6c6a01e530..544f0ee987976 100644
--- a/compiler-rt/lib/scudo/standalone/primary32.h
+++ b/compiler-rt/lib/scudo/standalone/primary32.h
@@ -18,6 +18,7 @@
 #include "report.h"
 #include "stats.h"
 #include "string_utils.h"
+#include "thread_annotations.h"
 
 namespace scudo {
 
@@ -62,7 +63,7 @@ template <typename Config> class SizeClassAllocator32 {
 
   static bool canAllocate(uptr Size) { return Size <= SizeClassMap::MaxSize; }
 
-  void init(s32 ReleaseToOsInterval) {
+  void init(s32 ReleaseToOsInterval) NO_THREAD_SAFETY_ANALYSIS {
     if (SCUDO_FUCHSIA)
       reportError("SizeClassAllocator32 is not supported on Fuchsia");
 
@@ -86,7 +87,7 @@ template <typename Config> class SizeClassAllocator32 {
     setOption(Option::ReleaseInterval, static_cast<sptr>(ReleaseToOsInterval));
   }
 
-  void unmapTestOnly() {
+  void unmapTestOnly() NO_THREAD_SAFETY_ANALYSIS {
     while (NumberOfStashedRegions > 0)
       unmap(reinterpret_cast<void *>(RegionsStash[--NumberOfStashedRegions]),
             RegionSize);
@@ -121,11 +122,11 @@ template <typename Config> class SizeClassAllocator32 {
     DCHECK_LT(ClassId, NumClasses);
     SizeClassInfo *Sci = getSizeClassInfo(ClassId);
     ScopedLock L(Sci->Mutex);
-    TransferBatch *B = popBatchImpl(C, ClassId);
+    TransferBatch *B = popBatchImpl(C, ClassId, Sci);
     if (UNLIKELY(!B)) {
       if (UNLIKELY(!populateFreeList(C, ClassId, Sci)))
         return nullptr;
-      B = popBatchImpl(C, ClassId);
+      B = popBatchImpl(C, ClassId, Sci);
       // if `populateFreeList` succeeded, we are supposed to get free blocks.
       DCHECK_NE(B, nullptr);
     }
@@ -149,7 +150,7 @@ template <typename Config> class SizeClassAllocator32 {
       // the blocks.
       if (Size == 1 && !populateFreeList(C, ClassId, Sci))
         return;
-      pushBlocksImpl(C, ClassId, Array, Size);
+      pushBlocksImpl(C, ClassId, Sci, Array, Size);
       Sci->Stats.PushedBlocks += Size;
       return;
     }
@@ -173,14 +174,14 @@ template <typename Config> class SizeClassAllocator32 {
     }
 
     ScopedLock L(Sci->Mutex);
-    pushBlocksImpl(C, ClassId, Array, Size, SameGroup);
+    pushBlocksImpl(C, ClassId, Sci, Array, Size, SameGroup);
 
     Sci->Stats.PushedBlocks += Size;
     if (ClassId != SizeClassMap::BatchClassId)
       releaseToOSMaybe(Sci, ClassId);
   }
 
-  void disable() {
+  void disable() NO_THREAD_SAFETY_ANALYSIS {
     // The BatchClassId must be locked last since other classes can use it.
     for (sptr I = static_cast<sptr>(NumClasses) - 1; I >= 0; I--) {
       if (static_cast<uptr>(I) == SizeClassMap::BatchClassId)
@@ -192,7 +193,7 @@ template <typename Config> class SizeClassAllocator32 {
     PossibleRegions.disable();
   }
 
-  void enable() {
+  void enable() NO_THREAD_SAFETY_ANALYSIS {
     PossibleRegions.enable();
     RegionsStashMutex.unlock();
     getSizeClassInfo(SizeClassMap::BatchClassId)->Mutex.unlock();
@@ -203,7 +204,8 @@ template <typename Config> class SizeClassAllocator32 {
     }
   }
 
-  template <typename F> void iterateOverBlocks(F Callback) {
+  template <typename F>
+  void iterateOverBlocks(F Callback) NO_THREAD_SAFETY_ANALYSIS {
     uptr MinRegionIndex = NumRegions, MaxRegionIndex = 0;
     for (uptr I = 0; I < NumClasses; I++) {
       SizeClassInfo *Sci = getSizeClassInfo(I);
@@ -241,7 +243,7 @@ template <typename Config> class SizeClassAllocator32 {
     for (uptr I = 0; I < NumClasses; I++) {
       SizeClassInfo *Sci = getSizeClassInfo(I);
       ScopedLock L(Sci->Mutex);
-      getStats(Str, I, 0);
+      getStats(Str, I, Sci, 0);
     }
   }
 
@@ -301,17 +303,17 @@ template <typename Config> class SizeClassAllocator32 {
 
   struct alignas(SCUDO_CACHE_LINE_SIZE) SizeClassInfo {
     HybridMutex Mutex;
-    SinglyLinkedList<BatchGroup> FreeList;
-    uptr CurrentRegion;
-    uptr CurrentRegionAllocated;
-    SizeClassStats Stats;
+    SinglyLinkedList<BatchGroup> FreeList GUARDED_BY(Mutex);
+    uptr CurrentRegion GUARDED_BY(Mutex);
+    uptr CurrentRegionAllocated GUARDED_BY(Mutex);
+    SizeClassStats Stats GUARDED_BY(Mutex);
     u32 RandState;
-    uptr AllocatedUser;
+    uptr AllocatedUser GUARDED_BY(Mutex);
     // Lowest & highest region index allocated for this size class, to avoid
     // looping through the whole NumRegions.
-    uptr MinRegionIndex;
-    uptr MaxRegionIndex;
-    ReleaseToOsInfo ReleaseInfo;
+    uptr MinRegionIndex GUARDED_BY(Mutex);
+    uptr MaxRegionIndex GUARDED_BY(Mutex);
+    ReleaseToOsInfo ReleaseInfo GUARDED_BY(Mutex);
   };
   static_assert(sizeof(SizeClassInfo) % SCUDO_CACHE_LINE_SIZE == 0, "");
 
@@ -346,7 +348,7 @@ template <typename Config> class SizeClassAllocator32 {
     return Region;
   }
 
-  uptr allocateRegion(SizeClassInfo *Sci, uptr ClassId) {
+  uptr allocateRegion(SizeClassInfo *Sci, uptr ClassId) REQUIRES(Sci->Mutex) {
     DCHECK_LT(ClassId, NumClasses);
     uptr Region = 0;
     {
@@ -399,10 +401,10 @@ template <typename Config> class SizeClassAllocator32 {
   // `SameGroup=true` instead.
   //
   // The region mutex needs to be held while calling this method.
-  void pushBlocksImpl(CacheT *C, uptr ClassId, CompactPtrT *Array, u32 Size,
-                      bool SameGroup = false) {
+  void pushBlocksImpl(CacheT *C, uptr ClassId, SizeClassInfo *Sci,
+                      CompactPtrT *Array, u32 Size, bool SameGroup = false)
+      REQUIRES(Sci->Mutex) {
     DCHECK_GT(Size, 0U);
-    SizeClassInfo *Sci = getSizeClassInfo(ClassId);
 
     auto CreateGroup = [&](uptr GroupId) {
       BatchGroup *BG = nullptr;
@@ -528,8 +530,8 @@ template <typename Config> class SizeClassAllocator32 {
   // group id will be considered first.
   //
   // The region mutex needs to be held while calling this method.
-  TransferBatch *popBatchImpl(CacheT *C, uptr ClassId) {
-    SizeClassInfo *Sci = getSizeClassInfo(ClassId);
+  TransferBatch *popBatchImpl(CacheT *C, uptr ClassId, SizeClassInfo *Sci)
+      REQUIRES(Sci->Mutex) {
     if (Sci->FreeList.empty())
       return nullptr;
 
@@ -557,7 +559,8 @@ template <typename Config> class SizeClassAllocator32 {
     return B;
   }
 
-  NOINLINE bool populateFreeList(CacheT *C, uptr ClassId, SizeClassInfo *Sci) {
+  NOINLINE bool populateFreeList(CacheT *C, uptr ClassId, SizeClassInfo *Sci)
+      REQUIRES(Sci->Mutex) {
     uptr Region;
     uptr Offset;
     // If the size-class currently has a region associated to it, use it. The
@@ -612,7 +615,7 @@ template <typename Config> class SizeClassAllocator32 {
       // it only happens when it crosses the group size boundary. Instead of
       // sorting them, treat them as same group here to avoid sorting the
       // almost-sorted blocks.
-      pushBlocksImpl(C, ClassId, &ShuffleArray[I], N, /*SameGroup=*/true);
+      pushBlocksImpl(C, ClassId, Sci, &ShuffleArray[I], N, /*SameGroup=*/true);
       I += N;
     }
 
@@ -633,8 +636,8 @@ template <typename Config> class SizeClassAllocator32 {
     return true;
   }
 
-  void getStats(ScopedString *Str, uptr ClassId, uptr Rss) {
-    SizeClassInfo *Sci = getSizeClassInfo(ClassId);
+  void getStats(ScopedString *Str, uptr ClassId, SizeClassInfo *Sci, uptr Rss)
+      REQUIRES(Sci->Mutex) {
     if (Sci->AllocatedUser == 0)
       return;
     const uptr InUse = Sci->Stats.PoppedBlocks - Sci->Stats.PushedBlocks;
@@ -647,7 +650,7 @@ template <typename Config> class SizeClassAllocator32 {
   }
 
   NOINLINE uptr releaseToOSMaybe(SizeClassInfo *Sci, uptr ClassId,
-                                 bool Force = false) {
+                                 bool Force = false) REQUIRES(Sci->Mutex) {
     const uptr BlockSize = getSizeByClassId(ClassId);
     const uptr PageSize = getPageSizeCached();
 
@@ -753,14 +756,15 @@ template <typename Config> class SizeClassAllocator32 {
   SizeClassInfo SizeClassInfoArray[NumClasses] = {};
 
   // Track the regions in use, 0 is unused, otherwise store ClassId + 1.
+  // FIXME: There is no dedicated lock for `PossibleRegions`.
   ByteMap PossibleRegions = {};
   atomic_s32 ReleaseToOsIntervalMs = {};
   // Unless several threads request regions simultaneously from 
diff erent size
   // classes, the stash rarely contains more than 1 entry.
   static constexpr uptr MaxStashedRegions = 4;
   HybridMutex RegionsStashMutex;
-  uptr NumberOfStashedRegions = 0;
-  uptr RegionsStash[MaxStashedRegions] = {};
+  uptr NumberOfStashedRegions GUARDED_BY(RegionsStashMutex) = 0;
+  uptr RegionsStash[MaxStashedRegions] GUARDED_BY(RegionsStashMutex) = {};
 };
 
 } // namespace scudo

diff  --git a/compiler-rt/lib/scudo/standalone/primary64.h b/compiler-rt/lib/scudo/standalone/primary64.h
index cf70d7b9fbd9e..1c9c6159f19b5 100644
--- a/compiler-rt/lib/scudo/standalone/primary64.h
+++ b/compiler-rt/lib/scudo/standalone/primary64.h
@@ -18,6 +18,7 @@
 #include "release.h"
 #include "stats.h"
 #include "string_utils.h"
+#include "thread_annotations.h"
 
 namespace scudo {
 
@@ -60,7 +61,7 @@ template <typename Config> class SizeClassAllocator64 {
 
   static bool canAllocate(uptr Size) { return Size <= SizeClassMap::MaxSize; }
 
-  void init(s32 ReleaseToOsInterval) {
+  void init(s32 ReleaseToOsInterval) NO_THREAD_SAFETY_ANALYSIS {
     DCHECK(isAligned(reinterpret_cast<uptr>(this), alignof(ThisT)));
     DCHECK_EQ(PrimaryBase, 0U);
     // Reserve the space required for the Primary.
@@ -86,7 +87,7 @@ template <typename Config> class SizeClassAllocator64 {
     setOption(Option::ReleaseInterval, static_cast<sptr>(ReleaseToOsInterval));
   }
 
-  void unmapTestOnly() {
+  void unmapTestOnly() NO_THREAD_SAFETY_ANALYSIS {
     for (uptr I = 0; I < NumClasses; I++) {
       RegionInfo *Region = getRegionInfo(I);
       *Region = {};
@@ -103,7 +104,7 @@ template <typename Config> class SizeClassAllocator64 {
     bool PrintStats = false;
     {
       ScopedLock L(Region->Mutex);
-      TransferBatch *B = popBatchImpl(C, ClassId);
+      TransferBatch *B = popBatchImpl(C, ClassId, Region);
       if (LIKELY(B)) {
         Region->Stats.PoppedBlocks += B->getCount();
         return B;
@@ -114,7 +115,7 @@ template <typename Config> class SizeClassAllocator64 {
                    !populateFreeList(C, ClassId, Region))) {
         PrintStats = !RegionIsExhausted && Region->Exhausted;
       } else {
-        B = popBatchImpl(C, ClassId);
+        B = popBatchImpl(C, ClassId, Region);
         // if `populateFreeList` succeeded, we are supposed to get free blocks.
         DCHECK_NE(B, nullptr);
         Region->Stats.PoppedBlocks += B->getCount();
@@ -152,7 +153,7 @@ template <typename Config> class SizeClassAllocator64 {
         // be less than two. Therefore, populate the free list before inserting
         // the blocks.
         if (Size >= 2U) {
-          pushBlocksImpl(C, SizeClassMap::BatchClassId, Array, Size);
+          pushBlocksImpl(C, SizeClassMap::BatchClassId, Region, Array, Size);
           Region->Stats.PushedBlocks += Size;
         } else {
           const bool RegionIsExhausted = Region->Exhausted;
@@ -199,14 +200,14 @@ template <typename Config> class SizeClassAllocator64 {
     }
 
     ScopedLock L(Region->Mutex);
-    pushBlocksImpl(C, ClassId, Array, Size, SameGroup);
+    pushBlocksImpl(C, ClassId, Region, Array, Size, SameGroup);
 
     Region->Stats.PushedBlocks += Size;
     if (ClassId != SizeClassMap::BatchClassId)
       releaseToOSMaybe(Region, ClassId);
   }
 
-  void disable() {
+  void disable() NO_THREAD_SAFETY_ANALYSIS {
     // The BatchClassId must be locked last since other classes can use it.
     for (sptr I = static_cast<sptr>(NumClasses) - 1; I >= 0; I--) {
       if (static_cast<uptr>(I) == SizeClassMap::BatchClassId)
@@ -216,7 +217,7 @@ template <typename Config> class SizeClassAllocator64 {
     getRegionInfo(SizeClassMap::BatchClassId)->Mutex.lock();
   }
 
-  void enable() {
+  void enable() NO_THREAD_SAFETY_ANALYSIS {
     getRegionInfo(SizeClassMap::BatchClassId)->Mutex.unlock();
     for (uptr I = 0; I < NumClasses; I++) {
       if (I == SizeClassMap::BatchClassId)
@@ -225,11 +226,12 @@ template <typename Config> class SizeClassAllocator64 {
     }
   }
 
-  template <typename F> void iterateOverBlocks(F Callback) {
+  template <typename F>
+  void iterateOverBlocks(F Callback) NO_THREAD_SAFETY_ANALYSIS {
     for (uptr I = 0; I < NumClasses; I++) {
       if (I == SizeClassMap::BatchClassId)
         continue;
-      const RegionInfo *Region = getRegionInfo(I);
+      RegionInfo *Region = getRegionInfo(I);
       const uptr BlockSize = getSizeByClassId(I);
       const uptr From = Region->RegionBeg;
       const uptr To = From + Region->AllocatedUser;
@@ -259,7 +261,7 @@ template <typename Config> class SizeClassAllocator64 {
     for (uptr I = 0; I < NumClasses; I++) {
       RegionInfo *Region = getRegionInfo(I);
       ScopedLock L(Region->Mutex);
-      getStats(Str, I, 0);
+      getStats(Str, I, Region, 0);
     }
   }
 
@@ -311,15 +313,23 @@ template <typename Config> class SizeClassAllocator64 {
         decompactPtrInternal(getCompactPtrBaseByClassId(ClassId), CompactPtr));
   }
 
-  static BlockInfo findNearestBlock(const char *RegionInfoData, uptr Ptr) {
+  static BlockInfo findNearestBlock(const char *RegionInfoData,
+                                    uptr Ptr) NO_THREAD_SAFETY_ANALYSIS {
     const RegionInfo *RegionInfoArray =
         reinterpret_cast<const RegionInfo *>(RegionInfoData);
+
     uptr ClassId;
     uptr MinDistance = -1UL;
     for (uptr I = 0; I != NumClasses; ++I) {
       if (I == SizeClassMap::BatchClassId)
         continue;
       uptr Begin = RegionInfoArray[I].RegionBeg;
+      // TODO(chiahungduan): In fact, We need to lock the RegionInfo::Mutex.
+      // However, the RegionInfoData is passed with const qualifier and lock the
+      // mutex requires modifying RegionInfoData, which means we need to remove
+      // the const qualifier. This may lead to another undefined behavior (The
+      // first one is accessing `AllocatedUser` without locking. It's better to
+      // pass `RegionInfoData` as `void *` then we can lock the mutex properly.
       uptr End = Begin + RegionInfoArray[I].AllocatedUser;
       if (Begin > End || End - Begin < SizeClassMap::getSizeByClassId(I))
         continue;
@@ -380,15 +390,18 @@ template <typename Config> class SizeClassAllocator64 {
 
   struct UnpaddedRegionInfo {
     HybridMutex Mutex;
-    SinglyLinkedList<BatchGroup> FreeList;
+    SinglyLinkedList<BatchGroup> FreeList GUARDED_BY(Mutex);
+    // This is initialized before thread creation.
     uptr RegionBeg = 0;
-    RegionStats Stats = {};
-    u32 RandState = 0;
-    uptr MappedUser = 0;    // Bytes mapped for user memory.
-    uptr AllocatedUser = 0; // Bytes allocated for user memory.
-    MapPlatformData Data = {};
-    ReleaseToOsInfo ReleaseInfo = {};
-    bool Exhausted = false;
+    RegionStats Stats GUARDED_BY(Mutex) = {};
+    u32 RandState GUARDED_BY(Mutex) = 0;
+    // Bytes mapped for user memory.
+    uptr MappedUser GUARDED_BY(Mutex) = 0;
+    // Bytes allocated for user memory.
+    uptr AllocatedUser GUARDED_BY(Mutex) = 0;
+    MapPlatformData Data GUARDED_BY(Mutex) = {};
+    ReleaseToOsInfo ReleaseInfo GUARDED_BY(Mutex) = {};
+    bool Exhausted GUARDED_BY(Mutex) = false;
   };
   struct RegionInfo : UnpaddedRegionInfo {
     char Padding[SCUDO_CACHE_LINE_SIZE -
@@ -451,10 +464,10 @@ template <typename Config> class SizeClassAllocator64 {
   // `SameGroup=true` instead.
   //
   // The region mutex needs to be held while calling this method.
-  void pushBlocksImpl(CacheT *C, uptr ClassId, CompactPtrT *Array, u32 Size,
-                      bool SameGroup = false) {
+  void pushBlocksImpl(CacheT *C, uptr ClassId, RegionInfo *Region,
+                      CompactPtrT *Array, u32 Size, bool SameGroup = false)
+      REQUIRES(Region->Mutex) {
     DCHECK_GT(Size, 0U);
-    RegionInfo *Region = getRegionInfo(ClassId);
 
     auto CreateGroup = [&](uptr GroupId) {
       BatchGroup *BG = nullptr;
@@ -580,8 +593,8 @@ template <typename Config> class SizeClassAllocator64 {
   // group id will be considered first.
   //
   // The region mutex needs to be held while calling this method.
-  TransferBatch *popBatchImpl(CacheT *C, uptr ClassId) {
-    RegionInfo *Region = getRegionInfo(ClassId);
+  TransferBatch *popBatchImpl(CacheT *C, uptr ClassId, RegionInfo *Region)
+      REQUIRES(Region->Mutex) {
     if (Region->FreeList.empty())
       return nullptr;
 
@@ -610,7 +623,8 @@ template <typename Config> class SizeClassAllocator64 {
     return B;
   }
 
-  NOINLINE bool populateFreeList(CacheT *C, uptr ClassId, RegionInfo *Region) {
+  NOINLINE bool populateFreeList(CacheT *C, uptr ClassId, RegionInfo *Region)
+      REQUIRES(Region->Mutex) {
     const uptr Size = getSizeByClassId(ClassId);
     const u16 MaxCount = TransferBatch::getMaxCached(Size);
 
@@ -665,7 +679,8 @@ template <typename Config> class SizeClassAllocator64 {
       // it only happens when it crosses the group size boundary. Instead of
       // sorting them, treat them as same group here to avoid sorting the
       // almost-sorted blocks.
-      pushBlocksImpl(C, ClassId, &ShuffleArray[I], N, /*SameGroup=*/true);
+      pushBlocksImpl(C, ClassId, Region, &ShuffleArray[I], N,
+                     /*SameGroup=*/true);
       I += N;
     }
 
@@ -676,8 +691,8 @@ template <typename Config> class SizeClassAllocator64 {
     return true;
   }
 
-  void getStats(ScopedString *Str, uptr ClassId, uptr Rss) {
-    RegionInfo *Region = getRegionInfo(ClassId);
+  void getStats(ScopedString *Str, uptr ClassId, RegionInfo *Region, uptr Rss)
+      REQUIRES(Region->Mutex) {
     if (Region->MappedUser == 0)
       return;
     const uptr InUse = Region->Stats.PoppedBlocks - Region->Stats.PushedBlocks;
@@ -694,7 +709,7 @@ template <typename Config> class SizeClassAllocator64 {
   }
 
   NOINLINE uptr releaseToOSMaybe(RegionInfo *Region, uptr ClassId,
-                                 bool Force = false) {
+                                 bool Force = false) REQUIRES(Region->Mutex) {
     const uptr BlockSize = getSizeByClassId(ClassId);
     const uptr PageSize = getPageSizeCached();
 

diff  --git a/compiler-rt/lib/scudo/standalone/quarantine.h b/compiler-rt/lib/scudo/standalone/quarantine.h
index 2d240d279c3ff..e65a733ced7b8 100644
--- a/compiler-rt/lib/scudo/standalone/quarantine.h
+++ b/compiler-rt/lib/scudo/standalone/quarantine.h
@@ -12,6 +12,7 @@
 #include "list.h"
 #include "mutex.h"
 #include "string_utils.h"
+#include "thread_annotations.h"
 
 namespace scudo {
 
@@ -172,7 +173,7 @@ template <typename Callback, typename Node> class GlobalQuarantine {
   typedef QuarantineCache<Callback> CacheT;
   using ThisT = GlobalQuarantine<Callback, Node>;
 
-  void init(uptr Size, uptr CacheSize) {
+  void init(uptr Size, uptr CacheSize) NO_THREAD_SAFETY_ANALYSIS {
     DCHECK(isAligned(reinterpret_cast<uptr>(this), alignof(ThisT)));
     DCHECK_EQ(atomic_load_relaxed(&MaxSize), 0U);
     DCHECK_EQ(atomic_load_relaxed(&MinSize), 0U);
@@ -197,16 +198,19 @@ template <typename Callback, typename Node> class GlobalQuarantine {
       drain(C, Cb);
   }
 
-  void NOINLINE drain(CacheT *C, Callback Cb) {
+  void NOINLINE drain(CacheT *C, Callback Cb) EXCLUDES(CacheMutex) {
+    bool needRecycle = false;
     {
       ScopedLock L(CacheMutex);
       Cache.transfer(C);
+      needRecycle = Cache.getSize() > getMaxSize();
     }
-    if (Cache.getSize() > getMaxSize() && RecycleMutex.tryLock())
+
+    if (needRecycle && RecycleMutex.tryLock())
       recycle(atomic_load_relaxed(&MinSize), Cb);
   }
 
-  void NOINLINE drainAndRecycle(CacheT *C, Callback Cb) {
+  void NOINLINE drainAndRecycle(CacheT *C, Callback Cb) EXCLUDES(CacheMutex) {
     {
       ScopedLock L(CacheMutex);
       Cache.transfer(C);
@@ -215,7 +219,7 @@ template <typename Callback, typename Node> class GlobalQuarantine {
     recycle(0, Cb);
   }
 
-  void getStats(ScopedString *Str) {
+  void getStats(ScopedString *Str) EXCLUDES(CacheMutex) {
     ScopedLock L(CacheMutex);
     // It assumes that the world is stopped, just as the allocator's printStats.
     Cache.getStats(Str);
@@ -223,13 +227,13 @@ template <typename Callback, typename Node> class GlobalQuarantine {
                 getMaxSize() >> 10, getCacheSize() >> 10);
   }
 
-  void disable() {
+  void disable() NO_THREAD_SAFETY_ANALYSIS {
     // RecycleMutex must be locked 1st since we grab CacheMutex within recycle.
     RecycleMutex.lock();
     CacheMutex.lock();
   }
 
-  void enable() {
+  void enable() NO_THREAD_SAFETY_ANALYSIS {
     CacheMutex.unlock();
     RecycleMutex.unlock();
   }
@@ -237,13 +241,14 @@ template <typename Callback, typename Node> class GlobalQuarantine {
 private:
   // Read-only data.
   alignas(SCUDO_CACHE_LINE_SIZE) HybridMutex CacheMutex;
-  CacheT Cache;
+  CacheT Cache GUARDED_BY(CacheMutex);
   alignas(SCUDO_CACHE_LINE_SIZE) HybridMutex RecycleMutex;
   atomic_uptr MinSize = {};
   atomic_uptr MaxSize = {};
   alignas(SCUDO_CACHE_LINE_SIZE) atomic_uptr MaxCacheSize = {};
 
-  void NOINLINE recycle(uptr MinSize, Callback Cb) {
+  void NOINLINE recycle(uptr MinSize, Callback Cb) RELEASE(RecycleMutex)
+      EXCLUDES(CacheMutex) {
     CacheT Tmp;
     Tmp.init();
     {

diff  --git a/compiler-rt/lib/scudo/standalone/release.h b/compiler-rt/lib/scudo/standalone/release.h
index 6de3b15534d0c..1e831a660092b 100644
--- a/compiler-rt/lib/scudo/standalone/release.h
+++ b/compiler-rt/lib/scudo/standalone/release.h
@@ -12,6 +12,7 @@
 #include "common.h"
 #include "list.h"
 #include "mutex.h"
+#include "thread_annotations.h"
 
 namespace scudo {
 
@@ -76,7 +77,12 @@ class RegionPageMap {
     Buffer = nullptr;
   }
 
-  void reset(uptr NumberOfRegion, uptr CountersPerRegion, uptr MaxValue) {
+  // Lock of `StaticBuffer` is acquired conditionally and there's no easy way to
+  // specify the thread-safety attribute properly in current code structure.
+  // Besides, it's the only place we may want to check thread safety. Therefore,
+  // it's fine to bypass the thread-safety analysis now.
+  void reset(uptr NumberOfRegion, uptr CountersPerRegion,
+             uptr MaxValue) NO_THREAD_SAFETY_ANALYSIS {
     DCHECK_GT(NumberOfRegion, 0);
     DCHECK_GT(CountersPerRegion, 0);
     DCHECK_GT(MaxValue, 0);
@@ -181,7 +187,7 @@ class RegionPageMap {
   [[no_unique_address]] MapPlatformData MapData = {};
 
   static HybridMutex Mutex;
-  static uptr StaticBuffer[StaticBufferCount];
+  static uptr StaticBuffer[StaticBufferCount] GUARDED_BY(Mutex);
 };
 
 template <class ReleaseRecorderT> class FreePagesRangeTracker {

diff  --git a/compiler-rt/lib/scudo/standalone/secondary.h b/compiler-rt/lib/scudo/standalone/secondary.h
index 1c4d0effe603a..e98e020941674 100644
--- a/compiler-rt/lib/scudo/standalone/secondary.h
+++ b/compiler-rt/lib/scudo/standalone/secondary.h
@@ -17,6 +17,7 @@
 #include "options.h"
 #include "stats.h"
 #include "string_utils.h"
+#include "thread_annotations.h"
 
 namespace scudo {
 
@@ -133,7 +134,7 @@ template <typename Config> class MapAllocatorCache {
                     Config::SecondaryCacheEntriesArraySize,
                 "");
 
-  void init(s32 ReleaseToOsInterval) {
+  void init(s32 ReleaseToOsInterval) NO_THREAD_SAFETY_ANALYSIS {
     DCHECK_EQ(EntriesCount, 0U);
     setOption(Option::MaxCacheEntriesCount,
               static_cast<sptr>(Config::SecondaryCacheDefaultMaxEntriesCount));
@@ -142,7 +143,7 @@ template <typename Config> class MapAllocatorCache {
     setOption(Option::ReleaseInterval, static_cast<sptr>(ReleaseToOsInterval));
   }
 
-  void store(Options Options, LargeBlock::Header *H) {
+  void store(Options Options, LargeBlock::Header *H) EXCLUDES(Mutex) {
     if (!canCache(H->CommitSize))
       return unmap(H);
 
@@ -227,7 +228,7 @@ template <typename Config> class MapAllocatorCache {
   }
 
   bool retrieve(Options Options, uptr Size, uptr Alignment,
-                LargeBlock::Header **H, bool *Zeroed) {
+                LargeBlock::Header **H, bool *Zeroed) EXCLUDES(Mutex) {
     const uptr PageSize = getPageSizeCached();
     const u32 MaxCount = atomic_load_relaxed(&MaxEntriesCount);
     bool Found = false;
@@ -249,8 +250,9 @@ template <typename Config> class MapAllocatorCache {
         if (HeaderPos > CommitBase + CommitSize)
           continue;
         if (HeaderPos < CommitBase ||
-            AllocPos > CommitBase + PageSize * MaxUnusedCachePages)
+            AllocPos > CommitBase + PageSize * MaxUnusedCachePages) {
           continue;
+        }
         Found = true;
         Entry = Entries[I];
         Entries[I].CommitBase = 0;
@@ -279,6 +281,8 @@ template <typename Config> class MapAllocatorCache {
       (*H)->MapBase = Entry.MapBase;
       (*H)->MapSize = Entry.MapSize;
       (*H)->Data = Entry.Data;
+
+      ScopedLock L(Mutex);
       EntriesCount--;
     }
     return Found;
@@ -315,7 +319,7 @@ template <typename Config> class MapAllocatorCache {
 
   void releaseToOS() { releaseOlderThan(UINT64_MAX); }
 
-  void disableMemoryTagging() {
+  void disableMemoryTagging() EXCLUDES(Mutex) {
     ScopedLock L(Mutex);
     for (u32 I = 0; I != Config::SecondaryCacheQuarantineSize; ++I) {
       if (Quarantine[I].CommitBase) {
@@ -332,9 +336,9 @@ template <typename Config> class MapAllocatorCache {
     QuarantinePos = -1U;
   }
 
-  void disable() { Mutex.lock(); }
+  void disable() NO_THREAD_SAFETY_ANALYSIS { Mutex.lock(); }
 
-  void enable() { Mutex.unlock(); }
+  void enable() NO_THREAD_SAFETY_ANALYSIS { Mutex.unlock(); }
 
   void unmapTestOnly() { empty(); }
 
@@ -375,7 +379,7 @@ template <typename Config> class MapAllocatorCache {
     u64 Time;
   };
 
-  void releaseIfOlderThan(CachedBlock &Entry, u64 Time) {
+  void releaseIfOlderThan(CachedBlock &Entry, u64 Time) REQUIRES(Mutex) {
     if (!Entry.CommitBase || !Entry.Time)
       return;
     if (Entry.Time > Time) {
@@ -387,7 +391,7 @@ template <typename Config> class MapAllocatorCache {
     Entry.Time = 0;
   }
 
-  void releaseOlderThan(u64 Time) {
+  void releaseOlderThan(u64 Time) EXCLUDES(Mutex) {
     ScopedLock L(Mutex);
     if (!EntriesCount || OldestTime == 0 || OldestTime > Time)
       return;
@@ -399,22 +403,24 @@ template <typename Config> class MapAllocatorCache {
   }
 
   HybridMutex Mutex;
-  u32 EntriesCount = 0;
-  u32 QuarantinePos = 0;
+  u32 EntriesCount GUARDED_BY(Mutex) = 0;
+  u32 QuarantinePos GUARDED_BY(Mutex) = 0;
   atomic_u32 MaxEntriesCount = {};
   atomic_uptr MaxEntrySize = {};
-  u64 OldestTime = 0;
-  u32 IsFullEvents = 0;
+  u64 OldestTime GUARDED_BY(Mutex) = 0;
+  u32 IsFullEvents GUARDED_BY(Mutex) = 0;
   atomic_s32 ReleaseToOsIntervalMs = {};
 
-  CachedBlock Entries[Config::SecondaryCacheEntriesArraySize] = {};
+  CachedBlock
+      Entries[Config::SecondaryCacheEntriesArraySize] GUARDED_BY(Mutex) = {};
   NonZeroLengthArray<CachedBlock, Config::SecondaryCacheQuarantineSize>
-      Quarantine = {};
+      Quarantine GUARDED_BY(Mutex) = {};
 };
 
 template <typename Config> class MapAllocator {
 public:
-  void init(GlobalStats *S, s32 ReleaseToOsInterval = -1) {
+  void init(GlobalStats *S,
+            s32 ReleaseToOsInterval = -1) NO_THREAD_SAFETY_ANALYSIS {
     DCHECK_EQ(AllocatedBytes, 0U);
     DCHECK_EQ(FreedBytes, 0U);
     Cache.init(ReleaseToOsInterval);
@@ -440,17 +446,18 @@ template <typename Config> class MapAllocator {
 
   void getStats(ScopedString *Str);
 
-  void disable() {
+  void disable() NO_THREAD_SAFETY_ANALYSIS {
     Mutex.lock();
     Cache.disable();
   }
 
-  void enable() {
+  void enable() NO_THREAD_SAFETY_ANALYSIS {
     Cache.enable();
     Mutex.unlock();
   }
 
-  template <typename F> void iterateOverBlocks(F Callback) const {
+  template <typename F>
+  void iterateOverBlocks(F Callback) const NO_THREAD_SAFETY_ANALYSIS {
     for (const auto &H : InUseBlocks) {
       uptr Ptr = reinterpret_cast<uptr>(&H) + LargeBlock::getHeaderSize();
       if (allocatorSupportsMemoryTagging<Config>())
@@ -472,14 +479,14 @@ template <typename Config> class MapAllocator {
 private:
   typename Config::SecondaryCache Cache;
 
-  HybridMutex Mutex;
-  DoublyLinkedList<LargeBlock::Header> InUseBlocks;
-  uptr AllocatedBytes = 0;
-  uptr FreedBytes = 0;
-  uptr LargestSize = 0;
-  u32 NumberOfAllocs = 0;
-  u32 NumberOfFrees = 0;
-  LocalStats Stats;
+  mutable HybridMutex Mutex;
+  DoublyLinkedList<LargeBlock::Header> InUseBlocks GUARDED_BY(Mutex);
+  uptr AllocatedBytes GUARDED_BY(Mutex) = 0;
+  uptr FreedBytes GUARDED_BY(Mutex) = 0;
+  uptr LargestSize GUARDED_BY(Mutex) = 0;
+  u32 NumberOfAllocs GUARDED_BY(Mutex) = 0;
+  u32 NumberOfFrees GUARDED_BY(Mutex) = 0;
+  LocalStats Stats GUARDED_BY(Mutex);
 };
 
 // As with the Primary, the size passed to this function includes any desired
@@ -600,7 +607,8 @@ void *MapAllocator<Config>::allocate(Options Options, uptr Size, uptr Alignment,
 }
 
 template <typename Config>
-void MapAllocator<Config>::deallocate(Options Options, void *Ptr) {
+void MapAllocator<Config>::deallocate(Options Options, void *Ptr)
+    EXCLUDES(Mutex) {
   LargeBlock::Header *H = LargeBlock::getHeader<Config>(Ptr);
   const uptr CommitSize = H->CommitSize;
   {
@@ -615,7 +623,7 @@ void MapAllocator<Config>::deallocate(Options Options, void *Ptr) {
 }
 
 template <typename Config>
-void MapAllocator<Config>::getStats(ScopedString *Str) {
+void MapAllocator<Config>::getStats(ScopedString *Str) EXCLUDES(Mutex) {
   ScopedLock L(Mutex);
   Str->append("Stats: MapAllocator: allocated %u times (%zuK), freed %u times "
               "(%zuK), remains %u (%zuK) max %zuM\n",

diff  --git a/compiler-rt/lib/scudo/standalone/stats.h b/compiler-rt/lib/scudo/standalone/stats.h
index be5bf2d3720a0..658b75863adee 100644
--- a/compiler-rt/lib/scudo/standalone/stats.h
+++ b/compiler-rt/lib/scudo/standalone/stats.h
@@ -12,6 +12,7 @@
 #include "atomic_helpers.h"
 #include "list.h"
 #include "mutex.h"
+#include "thread_annotations.h"
 
 #include <string.h>
 
@@ -60,19 +61,19 @@ class GlobalStats : public LocalStats {
 public:
   void init() { LocalStats::init(); }
 
-  void link(LocalStats *S) {
+  void link(LocalStats *S) EXCLUDES(Mutex) {
     ScopedLock L(Mutex);
     StatsList.push_back(S);
   }
 
-  void unlink(LocalStats *S) {
+  void unlink(LocalStats *S) EXCLUDES(Mutex) {
     ScopedLock L(Mutex);
     StatsList.remove(S);
     for (uptr I = 0; I < StatCount; I++)
       add(static_cast<StatType>(I), S->get(static_cast<StatType>(I)));
   }
 
-  void get(uptr *S) const {
+  void get(uptr *S) const EXCLUDES(Mutex) {
     ScopedLock L(Mutex);
     for (uptr I = 0; I < StatCount; I++)
       S[I] = LocalStats::get(static_cast<StatType>(I));
@@ -85,15 +86,15 @@ class GlobalStats : public LocalStats {
       S[I] = static_cast<sptr>(S[I]) >= 0 ? S[I] : 0;
   }
 
-  void lock() { Mutex.lock(); }
-  void unlock() { Mutex.unlock(); }
+  void lock() ACQUIRE(Mutex) { Mutex.lock(); }
+  void unlock() RELEASE(Mutex) { Mutex.unlock(); }
 
-  void disable() { lock(); }
-  void enable() { unlock(); }
+  void disable() ACQUIRE(Mutex) { lock(); }
+  void enable() RELEASE(Mutex) { unlock(); }
 
 private:
   mutable HybridMutex Mutex;
-  DoublyLinkedList<LocalStats> StatsList;
+  DoublyLinkedList<LocalStats> StatsList GUARDED_BY(Mutex);
 };
 
 } // namespace scudo

diff  --git a/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt b/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt
index 474c75173d9f3..50468d9c6ddc3 100644
--- a/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt
+++ b/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt
@@ -31,6 +31,9 @@ if (COMPILER_RT_HAS_GWP_ASAN)
        -mno-omit-leaf-frame-pointer)
 endif()
 
+append_list_if(COMPILER_RT_HAS_WTHREAD_SAFETY_FLAG -Werror=thread-safety
+  SCUDO_UNITTEST_CFLAGS)
+
 set(SCUDO_TEST_ARCH ${SCUDO_STANDALONE_SUPPORTED_ARCH})
 
 # gtests requires c++

diff  --git a/compiler-rt/lib/scudo/standalone/tests/combined_test.cpp b/compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
index 16032fc545494..63cb5711c1930 100644
--- a/compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
+++ b/compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
@@ -435,7 +435,7 @@ SCUDO_TYPED_TEST(ScudoCombinedTest, Stats) {
   EXPECT_NE(Stats.find("Stats: Quarantine"), std::string::npos);
 }
 
-SCUDO_TYPED_TEST(ScudoCombinedTest, CacheDrain) {
+SCUDO_TYPED_TEST(ScudoCombinedTest, CacheDrain) NO_THREAD_SAFETY_ANALYSIS {
   auto *Allocator = this->Allocator.get();
 
   std::vector<void *> V;

diff  --git a/compiler-rt/lib/scudo/standalone/tests/mutex_test.cpp b/compiler-rt/lib/scudo/standalone/tests/mutex_test.cpp
index d3242a3f57d46..c3efeab8272d0 100644
--- a/compiler-rt/lib/scudo/standalone/tests/mutex_test.cpp
+++ b/compiler-rt/lib/scudo/standalone/tests/mutex_test.cpp
@@ -99,3 +99,10 @@ TEST(ScudoMutexTest, MutexTry) {
   for (scudo::u32 I = 0; I < NumberOfThreads; I++)
     pthread_join(Threads[I], 0);
 }
+
+TEST(ScudoMutexTest, MutexAssertHeld) {
+  scudo::HybridMutex M;
+  M.lock();
+  M.assertHeld();
+  M.unlock();
+}

diff  --git a/compiler-rt/lib/scudo/standalone/tests/tsd_test.cpp b/compiler-rt/lib/scudo/standalone/tests/tsd_test.cpp
index 6c0c86d4e78b6..44b6bfdf05d6b 100644
--- a/compiler-rt/lib/scudo/standalone/tests/tsd_test.cpp
+++ b/compiler-rt/lib/scudo/standalone/tests/tsd_test.cpp
@@ -82,7 +82,7 @@ TEST(ScudoTSDTest, TSDRegistryInit) {
   EXPECT_FALSE(Allocator->isInitialized());
 
   auto Registry = Allocator->getTSDRegistry();
-  Registry->init(Allocator.get());
+  Registry->initOnceMaybe(Allocator.get());
   EXPECT_TRUE(Allocator->isInitialized());
 }
 

diff  --git a/compiler-rt/lib/scudo/standalone/thread_annotations.h b/compiler-rt/lib/scudo/standalone/thread_annotations.h
new file mode 100644
index 0000000000000..68a1087c2034a
--- /dev/null
+++ b/compiler-rt/lib/scudo/standalone/thread_annotations.h
@@ -0,0 +1,70 @@
+//===-- thread_annotations.h ------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef SCUDO_THREAD_ANNOTATIONS_
+#define SCUDO_THREAD_ANNOTATIONS_
+
+// Enable thread safety attributes only with clang.
+// The attributes can be safely ignored when compiling with other compilers.
+#if defined(__clang__)
+#define THREAD_ANNOTATION_ATTRIBUTE_(x) __attribute__((x))
+#else
+#define THREAD_ANNOTATION_ATTRIBUTE_(x) // no-op
+#endif
+
+#define CAPABILITY(x) THREAD_ANNOTATION_ATTRIBUTE_(capability(x))
+
+#define SCOPED_CAPABILITY THREAD_ANNOTATION_ATTRIBUTE_(scoped_lockable)
+
+#define GUARDED_BY(x) THREAD_ANNOTATION_ATTRIBUTE_(guarded_by(x))
+
+#define PT_GUARDED_BY(x) THREAD_ANNOTATION_ATTRIBUTE_(pt_guarded_by(x))
+
+#define ACQUIRED_BEFORE(...)                                                   \
+  THREAD_ANNOTATION_ATTRIBUTE_(acquired_before(__VA_ARGS__))
+
+#define ACQUIRED_AFTER(...)                                                    \
+  THREAD_ANNOTATION_ATTRIBUTE_(acquired_after(__VA_ARGS__))
+
+#define REQUIRES(...)                                                          \
+  THREAD_ANNOTATION_ATTRIBUTE_(requires_capability(__VA_ARGS__))
+
+#define REQUIRES_SHARED(...)                                                   \
+  THREAD_ANNOTATION_ATTRIBUTE_(requires_shared_capability(__VA_ARGS__))
+
+#define ACQUIRE(...)                                                           \
+  THREAD_ANNOTATION_ATTRIBUTE_(acquire_capability(__VA_ARGS__))
+
+#define ACQUIRE_SHARED(...)                                                    \
+  THREAD_ANNOTATION_ATTRIBUTE_(acquire_shared_capability(__VA_ARGS__))
+
+#define RELEASE(...)                                                           \
+  THREAD_ANNOTATION_ATTRIBUTE_(release_capability(__VA_ARGS__))
+
+#define RELEASE_SHARED(...)                                                    \
+  THREAD_ANNOTATION_ATTRIBUTE_(release_shared_capability(__VA_ARGS__))
+
+#define TRY_ACQUIRE(...)                                                       \
+  THREAD_ANNOTATION_ATTRIBUTE_(try_acquire_capability(__VA_ARGS__))
+
+#define TRY_ACQUIRE_SHARED(...)                                                \
+  THREAD_ANNOTATION_ATTRIBUTE_(try_acquire_shared_capability(__VA_ARGS__))
+
+#define EXCLUDES(...) THREAD_ANNOTATION_ATTRIBUTE_(locks_excluded(__VA_ARGS__))
+
+#define ASSERT_CAPABILITY(x) THREAD_ANNOTATION_ATTRIBUTE_(assert_capability(x))
+
+#define ASSERT_SHARED_CAPABILITY(x)                                            \
+  THREAD_ANNOTATION_ATTRIBUTE_(assert_shared_capability(x))
+
+#define RETURN_CAPABILITY(x) THREAD_ANNOTATION_ATTRIBUTE_(lock_returned(x))
+
+#define NO_THREAD_SAFETY_ANALYSIS                                              \
+  THREAD_ANNOTATION_ATTRIBUTE_(no_thread_safety_analysis)
+
+#endif // SCUDO_THREAD_ANNOTATIONS_

diff  --git a/compiler-rt/lib/scudo/standalone/trusty.cpp b/compiler-rt/lib/scudo/standalone/trusty.cpp
index 81d6bc585f09f..702d4a9a6b63f 100644
--- a/compiler-rt/lib/scudo/standalone/trusty.cpp
+++ b/compiler-rt/lib/scudo/standalone/trusty.cpp
@@ -76,6 +76,8 @@ void HybridMutex::lockSlow() {}
 
 void HybridMutex::unlock() {}
 
+void HybridMutex::assertHeldImpl() {}
+
 u64 getMonotonicTime() {
   timespec TS;
   clock_gettime(CLOCK_MONOTONIC, &TS);

diff  --git a/compiler-rt/lib/scudo/standalone/tsd.h b/compiler-rt/lib/scudo/standalone/tsd.h
index b400a3b56da9c..daf2a098f0e5d 100644
--- a/compiler-rt/lib/scudo/standalone/tsd.h
+++ b/compiler-rt/lib/scudo/standalone/tsd.h
@@ -12,6 +12,7 @@
 #include "atomic_helpers.h"
 #include "common.h"
 #include "mutex.h"
+#include "thread_annotations.h"
 
 #include <limits.h> // for PTHREAD_DESTRUCTOR_ITERATIONS
 #include <pthread.h>
@@ -24,21 +25,20 @@
 namespace scudo {
 
 template <class Allocator> struct alignas(SCUDO_CACHE_LINE_SIZE) TSD {
+  // TODO: Add thread-safety annotation on `Cache` and `QuarantineCache`.
   typename Allocator::CacheT Cache;
   typename Allocator::QuarantineCacheT QuarantineCache;
   using ThisT = TSD<Allocator>;
   u8 DestructorIterations = 0;
 
-  void init(Allocator *Instance) {
+  void init(Allocator *Instance) NO_THREAD_SAFETY_ANALYSIS {
     DCHECK_EQ(DestructorIterations, 0U);
     DCHECK(isAligned(reinterpret_cast<uptr>(this), alignof(ThisT)));
     Instance->initCache(&Cache);
     DestructorIterations = PTHREAD_DESTRUCTOR_ITERATIONS;
   }
 
-  void commitBack(Allocator *Instance) { Instance->commitBack(this); }
-
-  inline bool tryLock() {
+  inline bool tryLock() NO_THREAD_SAFETY_ANALYSIS {
     if (Mutex.tryLock()) {
       atomic_store_relaxed(&Precedence, 0);
       return true;
@@ -49,13 +49,17 @@ template <class Allocator> struct alignas(SCUDO_CACHE_LINE_SIZE) TSD {
           static_cast<uptr>(getMonotonicTime() >> FIRST_32_SECOND_64(16, 0)));
     return false;
   }
-  inline void lock() {
+  inline void lock() NO_THREAD_SAFETY_ANALYSIS {
     atomic_store_relaxed(&Precedence, 0);
     Mutex.lock();
   }
-  inline void unlock() { Mutex.unlock(); }
+  inline void unlock() NO_THREAD_SAFETY_ANALYSIS { Mutex.unlock(); }
   inline uptr getPrecedence() { return atomic_load_relaxed(&Precedence); }
 
+  void commitBack(Allocator *Instance) ASSERT_CAPABILITY(Mutex) {
+    Instance->commitBack(this);
+  }
+
 private:
   HybridMutex Mutex;
   atomic_uptr Precedence = {};

diff  --git a/compiler-rt/lib/scudo/standalone/tsd_exclusive.h b/compiler-rt/lib/scudo/standalone/tsd_exclusive.h
index d49427b2005b1..62da8aeb537ab 100644
--- a/compiler-rt/lib/scudo/standalone/tsd_exclusive.h
+++ b/compiler-rt/lib/scudo/standalone/tsd_exclusive.h
@@ -25,7 +25,7 @@ struct ThreadState {
 template <class Allocator> void teardownThread(void *Ptr);
 
 template <class Allocator> struct TSDRegistryExT {
-  void init(Allocator *Instance) {
+  void init(Allocator *Instance) REQUIRES(Mutex) {
     DCHECK(!Initialized);
     Instance->init();
     CHECK_EQ(pthread_key_create(&PThreadKey, teardownThread<Allocator>), 0);
@@ -33,14 +33,14 @@ template <class Allocator> struct TSDRegistryExT {
     Initialized = true;
   }
 
-  void initOnceMaybe(Allocator *Instance) {
+  void initOnceMaybe(Allocator *Instance) EXCLUDES(Mutex) {
     ScopedLock L(Mutex);
     if (LIKELY(Initialized))
       return;
     init(Instance); // Sets Initialized.
   }
 
-  void unmapTestOnly(Allocator *Instance) {
+  void unmapTestOnly(Allocator *Instance) EXCLUDES(Mutex) {
     DCHECK(Instance);
     if (reinterpret_cast<Allocator *>(pthread_getspecific(PThreadKey))) {
       DCHECK_EQ(reinterpret_cast<Allocator *>(pthread_getspecific(PThreadKey)),
@@ -53,6 +53,7 @@ template <class Allocator> struct TSDRegistryExT {
     FallbackTSD.commitBack(Instance);
     FallbackTSD = {};
     State = {};
+    ScopedLock L(Mutex);
     Initialized = false;
   }
 
@@ -62,7 +63,13 @@ template <class Allocator> struct TSDRegistryExT {
     initThread(Instance, MinimalInit);
   }
 
-  ALWAYS_INLINE TSD<Allocator> *getTSDAndLock(bool *UnlockRequired) {
+  // TODO(chiahungduan): Consider removing the argument `UnlockRequired` by
+  // embedding the logic into TSD or always locking the TSD. It will enable us
+  // to properly mark thread annotation here and adding proper runtime
+  // assertions in the member functions of TSD. For example, assert the lock is
+  // acquired before calling TSD::commitBack().
+  ALWAYS_INLINE TSD<Allocator> *
+  getTSDAndLock(bool *UnlockRequired) NO_THREAD_SAFETY_ANALYSIS {
     if (LIKELY(State.InitState == ThreadState::Initialized &&
                !atomic_load(&Disabled, memory_order_acquire))) {
       *UnlockRequired = false;
@@ -75,13 +82,13 @@ template <class Allocator> struct TSDRegistryExT {
 
   // To disable the exclusive TSD registry, we effectively lock the fallback TSD
   // and force all threads to attempt to use it instead of their local one.
-  void disable() {
+  void disable() NO_THREAD_SAFETY_ANALYSIS {
     Mutex.lock();
     FallbackTSD.lock();
     atomic_store(&Disabled, 1U, memory_order_release);
   }
 
-  void enable() {
+  void enable() NO_THREAD_SAFETY_ANALYSIS {
     atomic_store(&Disabled, 0U, memory_order_release);
     FallbackTSD.unlock();
     Mutex.unlock();
@@ -113,7 +120,7 @@ template <class Allocator> struct TSDRegistryExT {
   }
 
   pthread_key_t PThreadKey = {};
-  bool Initialized = false;
+  bool Initialized GUARDED_BY(Mutex) = false;
   atomic_u8 Disabled = {};
   TSD<Allocator> FallbackTSD;
   HybridMutex Mutex;
@@ -128,7 +135,8 @@ thread_local TSD<Allocator> TSDRegistryExT<Allocator>::ThreadTSD;
 template <class Allocator>
 thread_local ThreadState TSDRegistryExT<Allocator>::State;
 
-template <class Allocator> void teardownThread(void *Ptr) {
+template <class Allocator>
+void teardownThread(void *Ptr) NO_THREAD_SAFETY_ANALYSIS {
   typedef TSDRegistryExT<Allocator> TSDRegistryT;
   Allocator *Instance = reinterpret_cast<Allocator *>(Ptr);
   // The glibc POSIX thread-local-storage deallocation routine calls user

diff  --git a/compiler-rt/lib/scudo/standalone/tsd_shared.h b/compiler-rt/lib/scudo/standalone/tsd_shared.h
index 1c2a880416b98..64b3bd844b0e3 100644
--- a/compiler-rt/lib/scudo/standalone/tsd_shared.h
+++ b/compiler-rt/lib/scudo/standalone/tsd_shared.h
@@ -24,7 +24,7 @@ namespace scudo {
 
 template <class Allocator, u32 TSDsArraySize, u32 DefaultTSDCount>
 struct TSDRegistrySharedT {
-  void init(Allocator *Instance) {
+  void init(Allocator *Instance) REQUIRES(Mutex) {
     DCHECK(!Initialized);
     Instance->init();
     for (u32 I = 0; I < TSDsArraySize; I++)
@@ -35,19 +35,20 @@ struct TSDRegistrySharedT {
     Initialized = true;
   }
 
-  void initOnceMaybe(Allocator *Instance) {
+  void initOnceMaybe(Allocator *Instance) EXCLUDES(Mutex) {
     ScopedLock L(Mutex);
     if (LIKELY(Initialized))
       return;
     init(Instance); // Sets Initialized.
   }
 
-  void unmapTestOnly(Allocator *Instance) {
+  void unmapTestOnly(Allocator *Instance) EXCLUDES(Mutex) {
     for (u32 I = 0; I < TSDsArraySize; I++) {
       TSDs[I].commitBack(Instance);
       TSDs[I] = {};
     }
     setCurrentTSD(nullptr);
+    ScopedLock L(Mutex);
     Initialized = false;
   }
 
@@ -58,7 +59,10 @@ struct TSDRegistrySharedT {
     initThread(Instance);
   }
 
-  ALWAYS_INLINE TSD<Allocator> *getTSDAndLock(bool *UnlockRequired) {
+  // TSDs is an array of locks and which is not supported for marking
+  // thread-safety capability.
+  ALWAYS_INLINE TSD<Allocator> *
+  getTSDAndLock(bool *UnlockRequired) NO_THREAD_SAFETY_ANALYSIS {
     TSD<Allocator> *TSD = getCurrentTSD();
     DCHECK(TSD);
     *UnlockRequired = true;
@@ -75,13 +79,13 @@ struct TSDRegistrySharedT {
     return getTSDAndLockSlow(TSD);
   }
 
-  void disable() {
+  void disable() NO_THREAD_SAFETY_ANALYSIS {
     Mutex.lock();
     for (u32 I = 0; I < TSDsArraySize; I++)
       TSDs[I].lock();
   }
 
-  void enable() {
+  void enable() NO_THREAD_SAFETY_ANALYSIS {
     for (s32 I = static_cast<s32>(TSDsArraySize - 1); I >= 0; I--)
       TSDs[I].unlock();
     Mutex.unlock();
@@ -119,7 +123,7 @@ struct TSDRegistrySharedT {
     return reinterpret_cast<TSD<Allocator> *>(*getTlsPtr() & ~1ULL);
   }
 
-  bool setNumberOfTSDs(u32 N) {
+  bool setNumberOfTSDs(u32 N) EXCLUDES(MutexTSDs) {
     ScopedLock L(MutexTSDs);
     if (N < NumberOfTSDs)
       return false;
@@ -150,7 +154,7 @@ struct TSDRegistrySharedT {
     *getTlsPtr() |= B;
   }
 
-  NOINLINE void initThread(Allocator *Instance) {
+  NOINLINE void initThread(Allocator *Instance) NO_THREAD_SAFETY_ANALYSIS {
     initOnceMaybe(Instance);
     // Initial context assignment is done in a plain round-robin fashion.
     const u32 Index = atomic_fetch_add(&CurrentIndex, 1U, memory_order_relaxed);
@@ -158,7 +162,10 @@ struct TSDRegistrySharedT {
     Instance->callPostInitCallback();
   }
 
-  NOINLINE TSD<Allocator> *getTSDAndLockSlow(TSD<Allocator> *CurrentTSD) {
+  // TSDs is an array of locks which is not supported for marking thread-safety
+  // capability.
+  NOINLINE TSD<Allocator> *getTSDAndLockSlow(TSD<Allocator> *CurrentTSD)
+      EXCLUDES(MutexTSDs) {
     // Use the Precedence of the current TSD as our random seed. Since we are
     // in the slow path, it means that tryLock failed, and as a result it's
     // very likely that said Precedence is non-zero.
@@ -202,10 +209,10 @@ struct TSDRegistrySharedT {
   }
 
   atomic_u32 CurrentIndex = {};
-  u32 NumberOfTSDs = 0;
-  u32 NumberOfCoPrimes = 0;
-  u32 CoPrimes[TSDsArraySize] = {};
-  bool Initialized = false;
+  u32 NumberOfTSDs GUARDED_BY(MutexTSDs) = 0;
+  u32 NumberOfCoPrimes GUARDED_BY(MutexTSDs) = 0;
+  u32 CoPrimes[TSDsArraySize] GUARDED_BY(MutexTSDs) = {};
+  bool Initialized GUARDED_BY(Mutex) = false;
   HybridMutex Mutex;
   HybridMutex MutexTSDs;
   TSD<Allocator> TSDs[TSDsArraySize];