[PATCH] D14288: [tsan] Alternative ThreadState storage for OS X

Tue Nov 3 08:10:05 PST 2015

kubabrecka created this revision.
kubabrecka added reviewers: kcc, samsonov, glider, dvyukov.
kubabrecka added subscribers: llvm-commits, zaks.anna, ismailp, jasonk.

On OS X, there are several issues with using `__thread` to store the ThreadState objects that TSan relies on in all interceptors and memory instrumentation:

* During early process startup, interceptors are called (from dyld, Libc, etc.) when TLV is simply not available and any access to it will crash.
* During early new thread initialization, interceptors are called, but the TLV for the current thread is not yet initialized.  It will be lazily loaded on the first access, but the initialization actually needs to call one of the intercepted functions (pthread_mutex_lock), creating a circular dependency.
* When a thread is finished, during its teardown, the TLV is destroyed (deallocated), but interceptors are still called on that thread, which will cause the TLV to get resurrected (by lazy initialization).

There are several possible workarounds, one could be to use `pthread_key_create` and `pthread_getspecific`, but this still has the thread finalization issue.  This patch presents a different solution (originally proposed by Kostya):  Based on the fact that `pthread_self()` is always available and reliable and returns a valid pointer to memory, we'll use the shadow memory of this pointer as a "poor man's TLV".  No user code should ever read/write to this internal libpthread structure, so it's safe to use it for this purpose.  We can simply lazily allocate the ThreadState object and store the pointer here.

To make this work, we need to store the main thread's ThreadState separately, because it needs to be available even before the shadow memory is initialized.  Note that the current patch never deallocates the ThreadState objects and simply leaks them, which I'll fix in a subsequent patch.

There are some performance implications here, but I'd like to point out that the hot path contains only a call to `pthread_main_np`, `pthread_self` and `MemToShadow`.  At least on OS X, pthread_self is only a single memory access (via the `%gs` segment) plus a return, and pthread_main_np has an extra memory access plus 2 arithmetic operations.  So it seems that this implementation shouldn't hurt too much.

(This is part of an effort to port TSan to OS X, and it's one the very first steps. Don't expect TSan on OS X to actually work or pass tests at this point.)


http://reviews.llvm.org/D14288

Files:
  lib/tsan/rtl/tsan_platform_mac.cc
  lib/tsan/rtl/tsan_rtl.cc
  lib/tsan/rtl/tsan_rtl.h

Index: lib/tsan/rtl/tsan_rtl.h
===================================================================

--- lib/tsan/rtl/tsan_rtl.h
+++ lib/tsan/rtl/tsan_rtl.h
@@ -409,13 +409,16 @@
                        uptr tls_addr, uptr tls_size);
 };
 
-#ifndef SANITIZER_GO
+#if defined(SANITIZER_GO) && !SANITIZER_MAC
 __attribute__((tls_model("initial-exec")))
 extern THREADLOCAL char cur_thread_placeholder[];
 INLINE ThreadState *cur_thread() {
   return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);
 }
 #endif
+#if SANITIZER_MAC
+ThreadState *cur_thread();
+#endif
 
 class ThreadContext : public ThreadContextBase {
  public:
Index: lib/tsan/rtl/tsan_rtl.cc
===================================================================
--- lib/tsan/rtl/tsan_rtl.cc
+++ lib/tsan/rtl/tsan_rtl.cc
@@ -44,7 +44,7 @@
 
 namespace __tsan {
 
-#ifndef SANITIZER_GO
+#if defined(SANITIZER_GO) && !SANITIZER_MAC
 THREADLOCAL char cur_thread_placeholder[sizeof(ThreadState)] ALIGNED(64);
 #endif
 static char ctx_placeholder[sizeof(Context)] ALIGNED(64);
Index: lib/tsan/rtl/tsan_platform_mac.cc
===================================================================
--- lib/tsan/rtl/tsan_platform_mac.cc
+++ lib/tsan/rtl/tsan_platform_mac.cc
@@ -40,6 +40,33 @@
 
 namespace __tsan {
 
+// On OS X, accessing TLVs via __thread or manually by using pthread_key_* is
+// problematic, because there are several places where interceptors are called
+// when TLVs are not accessible (early process startup, thread cleanup, ...).
+// The following provides a "poor man's TLV" implementation, where we use the
+// shadow memory of the pointer returned by pthread_self() to store a pointer to
+// the ThreadState object. The main thread's ThreadState pointer is stored
+// separately in a static variable, because we need to access it even before the
+// shadow memory is set up.
+// TODO(kuba.brecka): This currently leaks the ThreadState objects as we never
+// deallocate them.
+ThreadState *cur_thread() {
+  ThreadState **fake_tls;
+  if (pthread_main_np()) {
+    static ThreadState *main_thread_state = nullptr;
+    fake_tls = &main_thread_state;
+  } else {
+    uptr thread_identity = (uptr)pthread_self();
+    fake_tls = (ThreadState **)MemToShadow(thread_identity);
+  }
+
+  if (*fake_tls == nullptr) {
+    *fake_tls = (ThreadState *)InternalAlloc(sizeof(ThreadState), nullptr);
+    internal_memset(*fake_tls, 0, sizeof(ThreadState));
+  }
+  return *fake_tls;
+}
+
 uptr GetShadowMemoryConsumption() {
   return 0;
 }


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D14288.39065.patch
Type: text/x-patch
Size: 2510 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151103/a573a961/attachment.bin>