[PATCH] D54889: Fiber support for thread sanitizer

Mon Dec 10 08:35:06 PST 2018

741g added a comment.

In D54889#1325450 <https://reviews.llvm.org/D54889#1325450>, @dvyukov wrote:

> FTR, Lingfeng did prototype change to qemu that uses these annotations:
>  https://android-review.googlesource.com/c/platform/external/qemu/+/844675

Thanks for the mention Dmitry. yuri, that change points to the needed changes to support TSAN with QEMU using the patch set before the latest upload today. We've tried it and it seems to work with the full emulator, but there are a lot of false positive messages and we think that it's possible to not require annotations in QEMU if interceptors for ucontext API were added here, and a refactoring was done to make sure jmpbufs did not get changed on all fiber switches.

The problem is that QEMU has this way of implementing coroutines:

Calling thread, qemu_coroutine_new:

- getcontext, makecontext with coroutine_trampoline func
- sigsetjmp
- swapcontext (in same thread)

In the new context, in coroutine_trampoline:

- sigsetjmp
- siglongjmp back to qemu_coroutine_new (not swapcontext)
- qemu_coroutine_new finishes

Later on, in the QEMU code when switching to that coroutine, in qemu_coroutine_switch:

- sigsetjmp
- siglongjmp to coroutine_trampoline in the other branch after the previous sigsetjmp there, run co->entry
- qemu_coroutine_switch itself is called to switch back to the "caller" coroutine (since they can be nested I suppose)

The current way suggests to insert fiber creation/switching aroudn getcontext and swapcontext. However, since swapcontext is used in such conjunction with longjmp, this creates an issue where one also needs to add fiber switch calls to longjmp where previously, we didn't have to do that before.

That's because the current way of switching fibers identifies fibers with the complete state of a cur_thread, so fiber switch + swapcontext may also unintendedly exchange the jmpbufs as well. At least for this use case, it looks like we need to separate jmp bufs from fiber switches, and generally separate fibers from physical threads, because they differ at least in the set of jmpbufs that must be preserved.

1. Physical thread switch: jmpbufs are not preserved
2. Fiber switch on the same physical thread. jmpbufs are preserved.

We also found that there were a lot of false positive warnings generated by QEMU due to switching fibers on the same physical thread + updating common state. TSAN should be aware of fiber switches that are restricted to one physical thread and consider the operations implicitly synchronized. This is another reason to separate their state.

To summarize:

The set of jumpbufs should stay invariant when changing "fibers" via setjmp/longjmp.

The set of jmpbufs is not necessarily invariant under swapcontext, because swapcontext can be done across multiple physical threads, while setjmp/longjmp must remain on one thread.

However, qemu runs setjmp -> swapcontext -> longjmp all in one thread; the physical thread should be tracked for those calls and it should allow access to the jmpbuf set.

For example, if we swapcontext in another physical thread we cannot use the same jmpbufs, but if we swapcontext in one thread it should use the current jmpbufs. This should allow that logic to work for QEMU at least, and it feels like it should work for all possible other uses (?)

On fiber API versus intercepting swapcontext/makecontext: Maybe we can have both.

Repository:
  rCRT Compiler Runtime

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D54889/new/

https://reviews.llvm.org/D54889