[PATCH] D54889: Fiber support for thread sanitizer

Mon Dec 17 07:23:14 PST 2018

yuri added a comment.

In D54889#1326910 <https://reviews.llvm.org/D54889#1326910>, @dvyukov wrote:

> In D54889#1325232 <https://reviews.llvm.org/D54889#1325232>, @yuri wrote:
>
> > > 2. This introduces slowdown into the hottest runtime functions (branch and indirection in cur_thread). I think check_analyze.sh test will fail. Kinda hard to say, because it's broken already at the moment, but this perturbs runtime functions enough:
> >
> > I have checked parts of the patch separately and found:
> >
> > - Part of patch related to HACKY_CALL does not affect code of critical functions and benchmark results
> > - Part of patch related to proc() does not affect code of critical functions and benchmark results
> > - The only part that affects critical functions is implementation cur_thread(). I tried to minimize its effect in new version of a patch.
>
>
> Yes, cur_thread() is the problem. It's used in all the hottest functions.
>  As far as I see the current code still contains an indirection in cur_thread(). We need to think of a way to not affect hot paths.

New version should be as fast as original code.

>>> 9. Looking at 1, 2, 3, 7 and 8, it seems that we may want to switch some smaller part (subset) of ThreadState. E.g. introduce TaskContext that contains only the part that we need to switch on fiber switches and then have: ThreadState -> TaskContext (of physical thread, always linked) -> TaskContext (current fiber context) or: ThreadState -> TaskContext (of physical thread, always linked) -> TaskContext (of a task) -> TaskContext (of a nested task) E.g. clock can stay in ThreadState (if we want to synchronize them) which will solve the performance problem, jmp_bufs can stay which will solve part of setjmp problem. This may resolve the hacky_call problem as well.

IMHO it is hardly possible function entry/exit use shadow stack, memory access functions use clock. Both structure are too big to be copied on each fiber switch.

>> I prefer not to do it:
>> 
>> - There is already similar separation ThreadState/Process. If something is not needed in ThreadState, it can be moved to Process.
> 
> You mean Processor?
>  Processor is completely different thing, it does not have any relation to user-visible state, it's merely a helper and implementation detail that makes internal machinery more efficient. ThreadState is a user-visible entity. Nothing can be moved from ThreadState to Processor (except for stuff that is meant to be in Processor, but wasn't moved with the introduction of Processor, but that's a different story, it does not relate to what we discuss here).
> 
>> - In case of single-threaded scheduler the main thing to switch is shadow stack. In case of multithreaded scheduler nearly everything should be switched. I do not think it is good idea to have two different implementations for these cases.
> 
> Not if we always synchronize on fiber switch.
>  If we synchronize then clock does not need to be switched. And clock is exactly the thing that we need embed in TLS to not slow down hot paths.
> 
>> - I tried to be on-par with Go sanitizer, which uses ThreadState for each goroutine and single Process per physical thread
> 
> Go is different. Processor is not user visible, ThreadState is.
>  Go model is exactly the same as we have for C/C++ without this change: both have only 1 user-visible entity: thread. With this change C/C++ has 2 user-visible entities: thread and fiber.

Can't agree with this. Only fiber is visible to user. Of cause, it is the same as thread until user switches it.

Repository:
  rCRT Compiler Runtime

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D54889/new/

https://reviews.llvm.org/D54889