<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Hanging lldb-vscode test: TestVSCode_setBreakpoints"
   href="https://bugs.llvm.org/show_bug.cgi?id=42271">42271</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Hanging lldb-vscode test: TestVSCode_setBreakpoints
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>lldb
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Keywords</th>
          <td>code-quality
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>All Bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>lldb-dev@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>kkleine@redhat.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>clayborg@gmail.com, jan.kratochvil@redhat.com, jdevlieghere@apple.com, llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=22096" name="attach_22096" title="Patch to have test timeout after 1 second instead of hanging forever.">attachment 22096</a> <a href="attachment.cgi?id=22096&action=edit" title="Patch to have test timeout after 1 second instead of hanging forever.">[details]</a></span>
Patch to have test timeout after 1 second instead of hanging forever.

Summary
=======

On one of our build bots and with local builds a test related to the
lldb-vscode binary is sporadically hanging: TestVSCode_setBreakpoints.py. When
I run it 100 times it eventually hangs after an arbitrary number of times.

How to reproduce
================

In order to produce some load on my developer machine it is enough to run
"ninja check-lldb" in a separate terminal. Then in another terminal you can run
the hanging test manually in a loop:

cd ~/llvm/lldb/test/
for i in {1..100}; do
  echo "=============== TEST $i ================"; 
  
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/llvm-builds/relwithdebinfo-ninja-clang-gold-ccache-distcc/lib
python dotest.py \
    -v \
    --executable
~/llvm-builds/relwithdebinfo-ninja-clang-gold-ccache-distcc/bin/lldb \
    -p TestVSCode_setBreakpoints.py \
    ../../lldb/packages/Python/lldbsuite/test/tools/lldb-vscode/breakpoint; 
  if [ $? -ne 0 ]; then
    echo "ERROR IN TEST $i";
    break;
  fi
done

Locally I have my LLVM monorepo checked out in ~/llvm and my build directory is
~/llvm-builds/relwithdebinfo-ninja-clang-gold-ccache-distcc . It doesn't matter
how you've compiled lldb.

Once you've reproduced that you sometimes (with a pretty high chance though)
get a hanging test, you might want to apply the attached patch and retry to
reproduce problem. You'll notice that it is gone. I'm not sure this really is
the proper patch to address the problem which is why I haven't created a
revision phabricator yet.


First analysis
==============

I've augmented the Python test scripts with print()'s to show what's going on.
That's when I found out that there's this a simplified call graph (at the
bottom is the hanging part): 


class VSCodeTestCaseBase: def verify_breakpoint_hit(self, breakpoint_ids):

  stopped_events = self.vscode.wait_for_stopped()

class DebugCommunication: def wait_for_stopped(self, timeout=None):

  stopped_event = self.wait_for_event(filter=['stopped', 'exited'],
timeout=timeout)

  def wait_for_event(self, filter=None, timeout=None):

    self.recv_packet(filter_type='event', filter_event=filter, timeout=timeout)

  def recv_packet(self, filter_type=None, filter_event=None, timeout=None):

    self.recv_condition.wait(timeout)

  threading.Condition.wait(timeout=None)


This is where our test hangs. One thing to note here is that from the call
side, there's no timeout passed along, so it evaluates to None in
threading.Condition.wait(timeout). The documentation for this function
(<a href="https://docs.python.org/3/library/threading.html#threading.Condition.wait">https://docs.python.org/3/library/threading.html#threading.Condition.wait</a>)
says "Wait until notified or until a timeout occurs." Somehow it waits forever
because it is not notified?

My question is why there's no timeout passed along to avoid those hanging
problems altogether? Just do a simple grep for "wait_for_stopped",
"wait_for_event", or "recv_packet" to find out how many times those functions
are called without a proper timeout.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>