<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Address Sanitizer deadlocks when used by SCHED_FIFO threads on x86 (not 64) when afined to a single CPU"

   href="https://llvm.org/bugs/show_bug.cgi?id=27986">27986</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Address Sanitizer deadlocks when used by SCHED_FIFO threads on x86 (not 64) when afined to a single CPU

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>compiler-rt

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>3.8

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>compiler-rt

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>nat1192@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=16457" name="attach_16457" title="Simple example to reproduce the issue">attachment 16457</a> <a href="attachment.cgi?id=16457&action=edit" title="Simple example to reproduce the issue">[details]</a></span>

Simple example to reproduce the issue

Using Address Sanitizer can cause the program to deadlock on allocations when

the following conditions are met:

1. Run the application on an x86, 32-bit platform on Linux (I don't know if

multi-lib compiles would reproduce this if compiled with -m32)

2. Have the threads in the application use the SCHED_FIFO scheduling policy.

3. Vary the priority of the threads.

4. Force all the threads in the application to use the same CPU.

The reason this seems to happen is that SizeClassAllocator32 is using a spin

lock to guard some internal data. Spin locks behave quite badly when they

interact with SCHED_FIFO threads, especially when those SCHED_FIFO threads

can't migrate CPUs.

Take this hypothetical example:

1. Thread 1 has high priority, thread 2 has low priority.

2. Thread 1 goes to sleep

3. Thread 2 decides to allocate, so it will take the spin lock.

4. The kernel interrupts Thread 2 in order to run some SCHED_OTHER processes.

Thread 2 still holds the spin lock, as it was interrupted before it was

finished.

5. While the other process was running, Thread 1 finished its timed sleep (so

it gets scheduled).

6. Thread 1 is running now and decides to allocate. It tries to take the spin

lock, but thread 2 still owns it.

7. Thread 1 tries to sched_yield() after a while (as that's how the spin lock

for the sanitizers are implemented). However thread 1 still has higher priority

than thread 2, so it's immediately scheduled to run again by the kernel.

8. "Deadlock" has occurred, as thread 1 will keep spinning on the lock and

thread 2 can never run because it's lower priority than thread 1.

This can be seen a bit more clearly in a stack trace of the provided example

application. Once the program stops printing the "Alive" messages, you can have

GDB interrupt the program and see these two threads (or something similar):

Thread 6 (Thread 0xab4feb40 (LWP 1520)):

#0  0xb7fdad91 in __kernel_vsyscall ()

#1  0xb7cdf217 in syscall () from /usr/lib/libc.so.6

#2  0x08118679 in __sanitizer::internal_sched_yield() ()

#3  0x0806466b in __sanitizer::StaticSpinMutex::LockSlow() ()

#4  0x08064834 in __sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul,

__sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul,

__sanitizer::FlatByteMap<4096ull>,

__asan::AsanMapUnmapCallback>::AllocateBatch(__sanitizer::AllocatorStats*,

__sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul,

4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul,

__sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >*, unsigned

long) ()

#5  0x08064c32 in

__sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul,

4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul,

__sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>

<span class="quote">>::Refill(__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul,</span >

__sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul,

__sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>*, unsigned

long) ()

#6  0x08067760 in __asan::Allocator::Allocate(unsigned long, unsigned long,

__sanitizer::BufferedStackTrace*, __asan::AllocType, bool) ()

#7  0x0806378b in __asan::asan_memalign(unsigned long, unsigned long,

__sanitizer::BufferedStackTrace*, __asan::AllocType) ()

#8  0x0812f543 in operator new(unsigned int) ()

#9  0x08132205 in dumb_thread (arg=0xbffffa60) at asan_fifo.cpp:26

#10 0x0806e7bf in asan_thread_start(void*) ()

#11 0xb7de42f1 in start_thread () from /usr/lib/libpthread.so.0

#12 0xb7ce37ce in clone () from /usr/lib/libc.so.6

Thread 5 (Thread 0xabcffb40 (LWP 1519)):

#0  0x08064863 in __sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul,

__sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul,

__sanitizer::FlatByteMap<4096ull>,

__asan::AsanMapUnmapCallback>::AllocateBatch(__sanitizer::AllocatorStats*,

__sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul,

4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul,

__sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >*, unsigned

long) ()

#1  0x08064c32 in

__sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul,

4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul,

__sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>

<span class="quote">>::Refill(__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul,</span >

__sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul,

__sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>*, unsigned

long) ()

#2  0x08067760 in __asan::Allocator::Allocate(unsigned long, unsigned long,

__sanitizer::BufferedStackTrace*, __asan::AllocType, bool) ()

#3  0x0806378b in __asan::asan_memalign(unsigned long, unsigned long,

__sanitizer::BufferedStackTrace*, __asan::AllocType) ()

#4  0x0812f543 in operator new(unsigned int) ()

#5  0x08132205 in dumb_thread (arg=0xbffffa5c) at asan_fifo.cpp:26

#6  0x0806e7bf in asan_thread_start(void*) ()

#7  0xb7de42f1 in start_thread () from /usr/lib/libpth

Even if I allow the program to resume and then interrupt it again, these

threads don't appear to make any forward progress.

The fix (or at least one fix I can think of) is to not use spin locks. Or at

the very least have the spin lock devolve into a blocking lock after a certain

number of tries.

Note when running the provided example that you need to run it as root (to have

permissions to create SCHED_FIFO threads) and running the application will

likely slow one CPU on your system down to a crawl. I recommend running it in a

VM. Also you might have to tweak some of the numbers to reproduce it on your

system. After running for a few seconds to a minute you should see the 'Alive'

messages stop. I compiled and tested this in a 32-bit VM of ArchLinux with both

Clang 3.8 and GCC 6.1.1. I compiled with 'clang++ asan_fifo.cpp -o test

-fsanitize=address -pthread'.

Also I understand that the example is a bit convoluted. It's a slimmed down

version of a real-world application that is much larger, and it takes several

days of constant running for this bug to normally manifest itself.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>