[PATCH] D86471: [MSAN] Add fiber switching APIs

Justin Cady via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 25 08:22:50 PDT 2020


justincady marked 2 inline comments as done.
justincady added inline comments.


================
Comment at: compiler-rt/lib/msan/msan_thread.cpp:114
+void MsanThread::StartSwitchFiber(uptr bottom, uptr size) {
+  if (atomic_load(&stack_switching_, memory_order_relaxed)) {
+    Report("ERROR: starting fiber switch while in fiber switch\n");
----------------
vitalybuka wrote:
> if we don't expect multiple threads access this field, then we don't need atomic
> if we do expect that, this atomic does not solve data race.
> 
> From what I see stack_top() stack_bottom()  only used from BufferedStackTrace::UnwindImpl for current thread.
> 
> So please remove remove stack_switching_ and simplify GetStackBounds()  (next_stack_top_ == next_stack_bottom_ should be enought? )
> 
Yes, you are correct. We do not expect multiple threads here currently. But I don't think this is attempting to prevent a data race (I used this pattern after seeing it in the ASAN implementation). My intent, and I believe the original intent in ASAN, is to prevent incorrect program logic and make sure error reports are always correct.

Regarding incorrect program logic, consider this sequence from a single OS thread:

1. __msan_start_switch_fiber() is invoked
2. __msan_start_switch_fiber() is invoked again

(Or similarly any sequence of events in which start and finish are not paired properly.)

In those cases, the atomic is the breadcrumb that lets MSAN terminate the program immediately and inform the user there was a bad sequence of calls (at the top of StartSwitchFiber and FinishSwitchFiber).

Regarding correct error reports, consider this sequence from a single OS thread:

1. __msan_start_switch_fiber() is invoked
2. Something bad happens before __msan_finish_switch_fiber() is invoked
3. An error report with stack trace is generated

Note that Step 2 is basically never expected to happen. But, if it does, Steps 2 and 3 could occur either before of after the actual fiber switch. The atomic is again used as a breadcrumb to make sure that the error report shows the correct stack, be it pre- or post- fiber switch (in GetStackBounds). If stack_switching_ is 1, we need to create a local stack variable and check if it is in the "old" stack or "new" stack, returning the appropriate stack addresses based on that check.

I think the confusing part is the comment in GetStackBounds that I also brought over from asan_thread.cpp, which incorrectly indicates a data race is being protected against.

But your comment also points out that even if all of the above is true, we don't necessarily need the breadcrumb to be an atomic. It could just be a normal bool, since we don't expect access by multiple threads.

Of course, if I am misunderstanding something please let me know. How would you like to proceed?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86471/new/

https://reviews.llvm.org/D86471



More information about the llvm-commits mailing list