[PATCH] D56238: [TSan] Support Objective-C @synchronized with tagged pointers

Julian Lettner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 7 11:06:03 PST 2019


yln marked 6 inline comments as done.
yln added inline comments.


================
Comment at: lib/tsan/rtl/tsan_interceptors_mac.cc:302
+// have an associated memory allocation. The Obj-C runtime uses tagged pointers
+// to transparently optimize small objects.
+static bool IsTaggedObjCPointer(id obj) {
----------------
dvyukov wrote:
> yln wrote:
> > dvyukov wrote:
> > > I still wonder what does synchronized itself do to lock them?
> > > Since the optimization is transparent, it suggest that these things still have reference identity rather than values identity. But I fail to see how we respect this reference identity. Consider, we have two different objects that are small and converted to a tagger pointer with the same value (say, integer 1). Now we will use the same address for these 2 objects because they have the same value, so we think they are the same. Since we merge them we can get false deadlock reports and all kinds of bad stuff. But then I am confused how synchronized distinguish them.
> > Ah, I think I now understand your question. Sorry for not communicating more clearly.
> > 
> > The Obj-C runtime maintains a mapping: <pointer value -> lock>. So two pointers with the same value (pointer  bits) will map to the same lock, regardless whether or not they are backed by an allocation. So our implementation already replicates the behavior of the Obj-C runtime.
> > 
> > We considered reporting a blanket warning any time @synchronized is used with a tagged pointer, since it might have surprising behavior from the programmer's point of view. However, Devin made a convincing point that tagged pointers are only one special case of a larger issue here.
> > In general, it is bad practice to lock on objects that you don't own/allocate/control the lifetime of. Think of a cache for small numbers, or interned string literals. If you use @synchronized with, e.g., two numbers of the same value, with the expectation that they are different objects, then it doesn't matter if they are tagged pointers or two pointers pointing to the same object. In both cases your assumption is broken. So this would need a more general warning to make sure programmers explicitly manage locks.
> > 
> > Summary:
> > @synchronized locks on pointer values. We already replicate this behavior (I hope ;)
> > 
> I see. Thanks for explanation.
> Is it that a user can't create new tagged pointers?
> But regardless, if this is what synchronized do, then I guess we can do the same. In some sense deadlock reports caused by merging will be true.
> Is it that a user can't create new tagged pointers?
Some APIs return tagged pointers and certain patterns create tagged pointers. So user code can create new tagged pointers, but should not depend on it (no guarantees) and ideally the programmer shouldn't even be aware of it (transparent optimization). We can say that @synchronized is a place where this optimization leaks its implementation details.

> But regardless, if this is what synchronized do, then I guess we can do the same. In some sense deadlock reports caused by merging will be true.
Yes, that's our intention here.



================
Comment at: lib/tsan/rtl/tsan_interceptors_mac.cc:319
+  if (h.created()) {
+    *h = (uptr) InternalAlloc(/*size=*/1);
+  }
----------------
dvyukov wrote:
> yln wrote:
> > dvyukov wrote:
> > > It's safer/better to use a user allocation. Internal allocations may not be covered by shadow in future (not sure it has not in all configurations). And mutexes have limited functionality in non-app mem.
> > With `user_alloc` TSan reports a race in the `objc-synchronize-tagged.mm` test. I do not completely understand what the cause is: the address/allocation is supposed to be racy by nature since it represents a mutex, which itself is used to establish synchronization. Do you have any insights here?
> > Plain old `malloc` seems to work. 
> malloc and user_alloc should generally be the same. But I wonder if malloc interceptor detects that it's called a non-intrumented library and enables ignores so that it does not model access to the range.
> This can be checked by running with TSAN_OPTIONS=ignore_noninstrumented_modules=0
> 
> Yes, I think a false report is possible if we use user memory without ignores.
> Consider a thread creates a mutex, then passes it to 2 threads and these 2 threads lock the mutex concurrently. In this case all proper synchronization is in place and the code is correct.
> However, what tsan sees is that the mutex is created by one of the 2 threads that calls lock first. So it looks like one thread creates a mutex and another thread immidiately tries to lock it without any synchronization.
> 
> Let's try to surround user_alloc with ThreadIgnoreBegin/End. Does it help?
Your analysis is correct. With `TSAN_OPTIONS=ignore_noninstrumented_modules=0` also `malloc` reports a warning and `ThreadIgnoreBegin/End` fixes the failing test. Good that we had that test. :)

One last question: looking at the code I don't think so, but does the size of the dummy allocation matter, i.e., should it be something like `sizeof(pthread_mutex_t)`?


Repository:
  rCRT Compiler Runtime

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56238/new/

https://reviews.llvm.org/D56238





More information about the llvm-commits mailing list