[PATCH] D54889: Fiber support for thread sanitizer
Dmitry Vyukov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 3 08:17:43 PST 2018
dvyukov added a comment.
Some high-level comments:
1. Do we want to synchronize fibers on switch? If fibers use TLS data, or cooperative scheduling (e.g. mutex unlock directly schedules the next critical section owner), or pass some data from prev fiber to next fiber, in all these cases not synchronizing fibers on switch will lead to false positives. As far as I remember in my old experiments I concluded that it's almost impossible to get rid of false positives it tasks/fibers are not synchronized on switch.
2. This introduces slowdown into the hottest runtime functions (branch and indirection in cur_thread).
I think check_analyze.sh test will fail. Kinda hard to say, because it's broken already at the moment, but this perturbs runtime functions enough:
Before:
write1 tot 367; size 1425; rsp 8; push 1; pop 5; call 2; load 21; store 13; sh 40; mov 94; lea 4; cmp 39
write2 tot 373; size 1489; rsp 8; push 1; pop 5; call 2; load 21; store 13; sh 40; mov 95; lea 4; cmp 43
write4 tot 374; size 1457; rsp 8; push 1; pop 5; call 2; load 21; store 13; sh 40; mov 95; lea 4; cmp 43
write8 tot 364; size 1429; rsp 8; push 1; pop 5; call 2; load 21; store 13; sh 40; mov 95; lea 4; cmp 39
read1 tot 409; size 1674; rsp 8; push 1; pop 4; call 2; load 21; store 13; sh 45; mov 97; lea 4; cmp 39
read2 tot 412; size 1694; rsp 8; push 1; pop 4; call 2; load 21; store 13; sh 45; mov 97; lea 4; cmp 43
read4 tot 412; size 1694; rsp 8; push 1; pop 4; call 2; load 21; store 13; sh 45; mov 97; lea 4; cmp 43
read8 tot 404; size 1660; rsp 8; push 1; pop 4; call 2; load 21; store 13; sh 45; mov 97; lea 4; cmp 39
func_entry tot 42; size 165; rsp 0; push 0; pop 0; call 1; load 4; store 2; sh 3; mov 12; lea 1; cmp 1
func_exit tot 37; size 149; rsp 0; push 0; pop 0; call 1; load 2; store 1; sh 3; mov 9; lea 1; cmp 1
After:
write1 tot 367; size 1439; rsp 4; push 1; pop 5; call 2; load 21; store 14; sh 40; mov 93; lea 4; cmp 40
write2 tot 375; size 1493; rsp 4; push 1; pop 5; call 2; load 21; store 14; sh 40; mov 94; lea 4; cmp 44
write4 tot 375; size 1471; rsp 4; push 1; pop 5; call 2; load 21; store 14; sh 40; mov 94; lea 4; cmp 44
write8 tot 365; size 1435; rsp 4; push 1; pop 5; call 2; load 21; store 14; sh 40; mov 94; lea 4; cmp 40
read1 tot 412; size 1682; rsp 4; push 1; pop 4; call 2; load 21; store 14; sh 45; mov 96; lea 4; cmp 40
read2 tot 414; size 1698; rsp 4; push 1; pop 4; call 2; load 21; store 14; sh 45; mov 96; lea 4; cmp 44
read4 tot 414; size 1698; rsp 4; push 1; pop 4; call 2; load 21; store 14; sh 45; mov 96; lea 4; cmp 44
read8 tot 404; size 1642; rsp 4; push 1; pop 4; call 2; load 21; store 14; sh 45; mov 96; lea 4; cmp 40
func_entry tot 48; size 197; rsp 0; push 0; pop 0; call 1; load 4; store 3; sh 3; mov 14; lea 1; cmp 2
func_exit tot 45; size 181; rsp 0; push 0; pop 0; call 1; load 2; store 2; sh 3; mov 11; lea 1; cmp 2
3. I think instead of __tsan_get_current_fiber we could use pointer 0. It prevents things like switching to context of another thread, which won't end well. And just simplifies API surface.
4. What if a fiber calls pthread_exit?
5. Can we combine __tsan_set_fiber_name with __tsan_create_fiber?
6. I think it's better to add flags argument to __tsan_create_fiber/__tsan_switch_to_fiber to not end up with __tsan_create_fiber2 and __tsan_create_fiber3.
7. Do we want to support nested fibers/tasks? Does GCD have nested tasks? I think TBB/OpenMP does. Won't begin/end fiber/task be a more generic API for this? E.g. fiber switching can be expressed as stop running the prev fiber (which switches back to the physical thread context), then start running new fiber.
8. How does this play with setjmp/longjmp? In particular with fiber switching based on setjmp/longjmp? jmp_bufs is part of ThreadState, won't we always try to do longjmp from a wrong ThreadState in such case?
9. Looking at 1, 2, 3, 7 and 8, it seems that we may want to switch some smaller part (subset) of ThreadState. E.g. introduce TaskContext that contains only the part that we need to switch on fiber switches and then have:
ThreadState -> TaskContext (of physical thread, always linked) -> TaskContext (current fiber context)
or:
ThreadState -> TaskContext (of physical thread, always linked) -> TaskContext (of a task) -> TaskContext (of a nested task)
E.g. clock can stay in ThreadState (if we want to synchronize them) which will solve the performance problem, jmp_bufs can stay which will solve part of setjmp problem. This may resolve the hacky_call problem as well.
Repository:
rCRT Compiler Runtime
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D54889/new/
https://reviews.llvm.org/D54889
More information about the llvm-commits
mailing list