[Lldb-commits] [PATCH] D158034: [lldb] Fix data race in ThreadList

Tue Aug 22 02:53:01 PDT 2023

labath added a comment.

After this patch, python_api/run_locker/TestRunLocker.py becomes flaky. https://lab.llvm.org/buildbot/#/builders/68/builds/58456 is the first such failure, but there have been about a dozen failures since then. The backtraces on the buildbot page are fairly useless, but I was able to capture this backtrace locally:

  include/c++/v1/vector:1537: assertion __n < size() failed: vector[] index out of bounds
  Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
  0  0x000055dd180146be llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 46
  1  0x000055dd18014d6c
  2  0x00007fea9b3ce1c0
  3  0x00007fea9b236347 raise + 167
  4  0x00007fea9b237797 abort + 247
  5  0x000055dd18862597
  6  0x000055dd124c4563 lldb_private::process_gdb_remote::ProcessGDBRemote::SetThreadPc(std::__u::shared_ptr<lldb_private::Thread> const&, unsigned long) + 275
  7  0x000055dd124c42fc lldb_private::process_gdb_remote::ProcessGDBRemote::DoUpdateThreadList(lldb_private::ThreadList&, lldb_private::ThreadList&) + 748
  8  0x000055dd129c8687 lldb_private::Process::UpdateThreadListIfNeeded() + 247
  9  0x000055dd12a39797 lldb_private::ThreadList::FindThreadByID(unsigned long, bool) + 55
  10 0x000055dd12a396db lldb_private::ThreadList::GetSelectedThread() + 43
  11 0x000055dd129a8dea lldb_private::ExecutionContext::ExecutionContext(lldb_private::Target*, bool) + 234
  12 0x00007fea99840bf9 lldb::SBTarget::EvaluateExpression(char const*, lldb::SBExpressionOptions const&) + 281
  13 0x00007fea99840a7b lldb::SBTarget::EvaluateExpression(char const*) + 187
  14 0x00007fea9995eb77
  15 0x000055dd185907cd
  16 0x000055dd18545814 _PyObject_Call + 292
  17 0x000055dd1865fecb _PyEval_EvalFrameDefault + 11515
  18 0x000055dd1865cfa8 _PyEval_Vector + 184
  19 0x000055dd186666c3
  20 0x000055dd18666906
  21 0x000055dd1865e56f _PyEval_EvalFrameDefault + 5023
  22 0x000055dd1865cfa8 _PyEval_Vector + 184
  23 0x000055dd186666c3
  24 0x000055dd18666906
  25 0x000055dd1865e56f _PyEval_EvalFrameDefault + 5023
  26 0x000055dd1865cfa8 _PyEval_Vector + 184
  27 0x000055dd1865fecb _PyEval_EvalFrameDefault + 11515
  28 0x000055dd1865cfa8 _PyEval_Vector + 184
  <a lot of additional python frames>

Given that the failure is in the `DoUpdateThreadList` function, I'm guessing that this patch removed/reduced some existing synchronization which prevented this code from executing concurrently with something else (I don't know what), although it's possible that the right fix (since updating the thread list while the process is running doesn't make sense) is to bail out of `EvaluateExpression` sooner (@jingham ?).

If you want to reproduce this, note that the bug reproduces relatively infrequently (~1% for me, though it seems to be a bit higher on the buildbot), so you may need to run it many times before you can catch it in action. The failing part is not always the same (e.g. sometimes it just hangs), but I expect the root cause to be the same. I've also confirmed that the failure rate of the test goes down to zero after reverting this patch.

Can you investigate (and possibly revert this patch in the mean time)?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158034/new/

https://reviews.llvm.org/D158034