<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [compiler-rt] safestacks 'pthread-cleanup.c' test is racy"
   href="https://bugs.llvm.org/show_bug.cgi?id=39001">39001</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[compiler-rt] safestacks 'pthread-cleanup.c' test is racy
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>new-bugs
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>new bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>jeremy.morse.llvm@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org, vitalybuka@google.com, vlad@tsyrklevich.net
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=20891" name="attach_20891" title="Strace of pthread-cleanup.c successfully terminating (which is an error)">attachment 20891</a> <a href="attachment.cgi?id=20891&action=edit" title="Strace of pthread-cleanup.c successfully terminating (which is an error)">[details]</a></span>
Strace of pthread-cleanup.c successfully terminating (which is an error)

In rare circumstances, the safestacks test 'pthread-cleanup.c' can fail (i.e.,
it doesn't crash) when the system is heavily loaded. We've seen intermittent
failures on Sonys internal CI for a while, and I've managed to replicate it by
running:
 * The test binary under strace, concurrent with
 * `llvm-lit -j 200` applied to the LLVM test suite.

The failure mode is a (very rare it seems) race where, in the code at [0],
pthread_join has reported that the tests first thread has terminated, but the
underlying linux thread has not been cleared yet. This manifests as a
successful call to tgkill leading to the 'unsafe stack' not being unmapped &
freed, which then doesn't cause the later crash that the program expects.

An strace of this happening is attached, running from just before the start of
main(), to the programs successful exit.

In this circumstance, to my undeducated eye it looks like the thread_stack_ll
struct of the still-live thread is kept linked into the to-free list, so IMHO
the problem is that the test relies on forward progress in the operating system
that isn't guaranteed. (I've no good ideas for how to get around that and
improve the test, alas).

[0]
<a href="https://github.com/llvm-mirror/compiler-rt/blob/d5d5b22249814bb4a2193509ed7ab33687507f98/lib/safestack/safestack.cc#L184">https://github.com/llvm-mirror/compiler-rt/blob/d5d5b22249814bb4a2193509ed7ab33687507f98/lib/safestack/safestack.cc#L184</a></pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>