[PATCH] D86471: [MSAN] Add fiber switching APIs

Vitaly Buka via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 25 18:00:31 PDT 2020


vitalybuka added inline comments.


================
Comment at: compiler-rt/lib/msan/msan_thread.cpp:45-47
+  atomic_store(&stack_switching_, 0, memory_order_release);
+  next_stack_top_ = 0;
+  next_stack_bottom_ = 0;
----------------
This should go into ::Create()
However Create uses MmapOrDie and implicitly reset everything to 0.



================
Comment at: compiler-rt/lib/msan/msan_thread.cpp:89-91
+    if (stack_bottom_ >= stack_top_)
+      return {0, 0};
+    return {stack_bottom_, stack_top_};
----------------
 > is not possible
 == only for 0,0


================
Comment at: compiler-rt/lib/msan/msan_thread.cpp:94
+  char local;
+  const uptr cur_stack = (uptr)&local;
+  // Note: need to check next stack first, because FinishSwitchFiber
----------------



================
Comment at: compiler-rt/lib/msan/msan_thread.cpp:114
+void MsanThread::StartSwitchFiber(uptr bottom, uptr size) {
+  if (atomic_load(&stack_switching_, memory_order_relaxed)) {
+    Report("ERROR: starting fiber switch while in fiber switch\n");
----------------
vitalybuka wrote:
> justincady wrote:
> > vitalybuka wrote:
> > > if we don't expect multiple threads access this field, then we don't need atomic
> > > if we do expect that, this atomic does not solve data race.
> > > 
> > > From what I see stack_top() stack_bottom()  only used from BufferedStackTrace::UnwindImpl for current thread.
> > > 
> > > So please remove remove stack_switching_ and simplify GetStackBounds()  (next_stack_top_ == next_stack_bottom_ should be enought? )
> > > 
> > Yes, you are correct. We do not expect multiple threads here currently. But I don't think this is attempting to prevent a data race (I used this pattern after seeing it in the ASAN implementation). My intent, and I believe the original intent in ASAN, is to prevent incorrect program logic and make sure error reports are always correct.
> > 
> > Regarding incorrect program logic, consider this sequence from a single OS thread:
> > 
> > 1. __msan_start_switch_fiber() is invoked
> > 2. __msan_start_switch_fiber() is invoked again
> > 
> > (Or similarly any sequence of events in which start and finish are not paired properly.)
> > 
> > In those cases, the atomic is the breadcrumb that lets MSAN terminate the program immediately and inform the user there was a bad sequence of calls (at the top of StartSwitchFiber and FinishSwitchFiber).
> > 
> > Regarding correct error reports, consider this sequence from a single OS thread:
> > 
> > 1. __msan_start_switch_fiber() is invoked
> > 2. Something bad happens before __msan_finish_switch_fiber() is invoked
> > 3. An error report with stack trace is generated
> > 
> > Note that Step 2 is basically never expected to happen. But, if it does, Steps 2 and 3 could occur either before of after the actual fiber switch. The atomic is again used as a breadcrumb to make sure that the error report shows the correct stack, be it pre- or post- fiber switch (in GetStackBounds). If stack_switching_ is 1, we need to create a local stack variable and check if it is in the "old" stack or "new" stack, returning the appropriate stack addresses based on that check.
> > 
> > I think the confusing part is the comment in GetStackBounds that I also brought over from asan_thread.cpp, which incorrectly indicates a data race is being protected against.
> > 
> > But your comment also points out that even if all of the above is true, we don't necessarily need the breadcrumb to be an atomic. It could just be a normal bool, since we don't expect access by multiple threads.
> > 
> > Of course, if I am misunderstanding something please let me know. How would you like to proceed?
> maybe
Lsan in Asan can inspect stacks from other threads, so I see why there is atomic. 


================
Comment at: compiler-rt/lib/msan/msan_thread.cpp:114-117
+  if (atomic_load(&stack_switching_, memory_order_relaxed)) {
+    Report("ERROR: starting fiber switch while in fiber switch\n");
+    Die();
+  }
----------------
justincady wrote:
> vitalybuka wrote:
> > if we don't expect multiple threads access this field, then we don't need atomic
> > if we do expect that, this atomic does not solve data race.
> > 
> > From what I see stack_top() stack_bottom()  only used from BufferedStackTrace::UnwindImpl for current thread.
> > 
> > So please remove remove stack_switching_ and simplify GetStackBounds()  (next_stack_top_ == next_stack_bottom_ should be enought? )
> > 
> Yes, you are correct. We do not expect multiple threads here currently. But I don't think this is attempting to prevent a data race (I used this pattern after seeing it in the ASAN implementation). My intent, and I believe the original intent in ASAN, is to prevent incorrect program logic and make sure error reports are always correct.
> 
> Regarding incorrect program logic, consider this sequence from a single OS thread:
> 
> 1. __msan_start_switch_fiber() is invoked
> 2. __msan_start_switch_fiber() is invoked again
> 
> (Or similarly any sequence of events in which start and finish are not paired properly.)
> 
> In those cases, the atomic is the breadcrumb that lets MSAN terminate the program immediately and inform the user there was a bad sequence of calls (at the top of StartSwitchFiber and FinishSwitchFiber).
> 
> Regarding correct error reports, consider this sequence from a single OS thread:
> 
> 1. __msan_start_switch_fiber() is invoked
> 2. Something bad happens before __msan_finish_switch_fiber() is invoked
> 3. An error report with stack trace is generated
> 
> Note that Step 2 is basically never expected to happen. But, if it does, Steps 2 and 3 could occur either before of after the actual fiber switch. The atomic is again used as a breadcrumb to make sure that the error report shows the correct stack, be it pre- or post- fiber switch (in GetStackBounds). If stack_switching_ is 1, we need to create a local stack variable and check if it is in the "old" stack or "new" stack, returning the appropriate stack addresses based on that check.
> 
> I think the confusing part is the comment in GetStackBounds that I also brought over from asan_thread.cpp, which incorrectly indicates a data race is being protected against.
> 
> But your comment also points out that even if all of the above is true, we don't necessarily need the breadcrumb to be an atomic. It could just be a normal bool, since we don't expect access by multiple threads.
> 
> Of course, if I am misunderstanding something please let me know. How would you like to proceed?
maybe


================
Comment at: compiler-rt/lib/msan/msan_thread.cpp:124-127
+  if (!atomic_load(&stack_switching_, memory_order_relaxed)) {
+    Report("ERROR: finishing a fiber switch that has not started\n");
+    Die();
+  }
----------------
maybe


================
Comment at: compiler-rt/lib/msan/msan_thread.cpp:128-133
+  if (bottom_old) {
+    *bottom_old = stack_bottom_;
+  }
+  if (size_old) {
+    *size_old = stack_top_ - stack_bottom_;
+  }
----------------
same for the rest


================
Comment at: compiler-rt/lib/msan/msan_thread.h:62
+
+  atomic_uint8_t stack_switching_;
+
----------------



================
Comment at: compiler-rt/test/msan/Linux/swapcontext_annotation_reset.cpp:41-44
+  // The parameters should _not_ be poisoned; this is the first call to foo.
+  foo(x, y, -1);
+  // The parameters should be poisoned; the prior call to foo left them so.
+  foo(x, y, 0);
----------------
I don't understand why the second one is poisoned


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86471/new/

https://reviews.llvm.org/D86471



More information about the llvm-commits mailing list