[PATCH] D54521: tsan: Add pthread_tryjoin_np and pthread_timedjoin_np interceptors

Wed Nov 21 01:17:59 PST 2018

dvyukov added inline comments.

================
Comment at: lib/tsan/rtl/tsan_interceptors.cc:1057
+  else
+    ThreadNotJoined(thr, pc, tid, (uptr)th);
+  return res;
----------------
yuri wrote:
> dvyukov wrote:
> > Why is this necessary?
> > We've already set it in CreateThread.
> > Also this looks racy and can CHECK-fail. Consider that this pthread_tryjoin fails but another thread concurrently also executes pthread_tryjoin and it succeeds and thread is destroyed.
> This is needed because ThreadTid()/FindThreadByUid() clears user_id. This behaviour can be traced back to commit https://github.com/llvm-mirror/compiler-rt/commit/411b2c9d6787b64939fc15fdeec65e9d65ba1a51.
> 
> Concurrent call of pthread_tryjoin() or any other join functions is racy because you can end up with already joined thread and access freed memory. Problem of current thread sanitizer code that in case of such racy application it fails CHECK instead of reporting clear error.
> 
> I personally do no like such implementation. Instead of clearing user_id, it would be better to set flag "thread being joined" and store stack context of calling thread. If later someone attempts to join the same thread – report an error. In such case ThreadNotJoined() would clear this flag.
I see. Yes, it's a bit messy because when pthread_join returns the pthread_t can be already reused for another thread.

You are right that concurrent pthread_tryjoin's are racy and bad (unless program know that they both won't succeed, but that would be pretty strange code).

> Instead of clearing user_id, it would be better to set flag...

Isn't this also racy if pthread_tryjoin succeeds? When pthread_tryjoin succeeds user_id (pthread_t) becomes stale and can be reused. So another thread can already be joining a different thread, but it finds the same user_id and the "thread being joined" flag set and decides that it does a racy pthread_join and reports a bug, but it reality it is joining a completely unrelated thread that happened to have the same user_id.

================
Comment at: test/tsan/Linux/thread_timedjoin.c:15
+  pthread_create(&t, 0, Thread, 0);
+  barrier_wait(&barrier);
+  struct timespec ts;
----------------
yuri wrote:
> dvyukov wrote:
> > Why do we need a barrier in this test? It looks like it's deterministic without the barrier.
> I just copied it from pthread_deatch test. Actually brarier can be useful in this test if it is moved farther . I will update tests.
With a barrier one could test that we join the thread on at least second iteration (coverage for ThreadNotJoined), or that pthread_timedjoin_np fails and then a race on shared var with the thread is detected (failed pthread_timedjoin_np should not synchronize threads).

Repository:
  rCRT Compiler Runtime

https://reviews.llvm.org/D54521