[llvm-bugs] [Bug 46825] New: clang DirectoryWatcherTest unittest hangs on some filesystems, but not others
via llvm-bugs
llvm-bugs at lists.llvm.org
Thu Jul 23 09:58:35 PDT 2020
https://bugs.llvm.org/show_bug.cgi?id=46825
Bug ID: 46825
Summary: clang DirectoryWatcherTest unittest hangs on some
filesystems, but not others
Product: Test Suite
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: lit
Assignee: unassignedbugs at nondot.org
Reporter: bfriesen at lbl.gov
CC: daniel at zuster.org, llvm-bugs at lists.llvm.org
Greetings,
I am very close to having a working x86 flang buildbot - everything about the
bot's workflow succeeds except for a single lit test, 'DirectoryWatcherTest'.
This test reads and writes some temporary files/directories in TMPDIR.
On the system where this buildbot is running (a Cray XC40), I have observed the
following behaviors (all results tested on commit 724bf4ee23a of llvm-project):
On "login nodes", the test runs without any issues. TMPDIR is set to `/tmp`,
which is a ramdisk:
```
cori10:DirectoryWatcher> ./DirectoryWatcherTests
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from DirectoryWatcherTest
[ RUN ] DirectoryWatcherTest.InitialScanSync
[ OK ] DirectoryWatcherTest.InitialScanSync (7 ms)
[ RUN ] DirectoryWatcherTest.InitialScanAsync
[ OK ] DirectoryWatcherTest.InitialScanAsync (8 ms)
[ RUN ] DirectoryWatcherTest.AddFiles
[ OK ] DirectoryWatcherTest.AddFiles (8 ms)
[ RUN ] DirectoryWatcherTest.ModifyFile
[ OK ] DirectoryWatcherTest.ModifyFile (16 ms)
[ RUN ] DirectoryWatcherTest.DeleteFile
[ OK ] DirectoryWatcherTest.DeleteFile (29 ms)
[ RUN ] DirectoryWatcherTest.DeleteWatchedDir
[ OK ] DirectoryWatcherTest.DeleteWatchedDir (0 ms)
[ RUN ] DirectoryWatcherTest.InvalidatedWatcher
[ OK ] DirectoryWatcherTest.InvalidatedWatcher (7 ms)
[ RUN ] DirectoryWatcherTest.InvalidatedWatcherAsync
[ OK ] DirectoryWatcherTest.InvalidatedWatcherAsync (16 ms)
[----------] 8 tests from DirectoryWatcherTest (91 ms total)
[----------] Global test environment tear-down
[==========] 8 tests from 1 test case ran. (91 ms total)
[ PASSED ] 8 tests.
cori10:DirectoryWatcher> strace -f -e open ./DirectoryWatcherTests 2>&1|grep
open
[pid 49990] open("/tmp/dirwatcher-adcb34/watch",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6
open("/tmp/dirwatcher-adcb34", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
open("/tmp/dirwatcher-adcb34/watch", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC)
= 4
[pid 49995] open("/tmp/dirwatcher-0cf5ec/watch",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 7
open("/tmp/dirwatcher-0cf5ec", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
open("/tmp/dirwatcher-0cf5ec/watch", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC)
= 4
[pid 49990] open("/tmp/dirwatcher-ab6ab0/watch",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6
open("/tmp/dirwatcher-ab6ab0", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
open("/tmp/dirwatcher-ab6ab0/watch", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC)
= 4
[pid 49990] open("/tmp/dirwatcher-f06c15/watch",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6
open("/tmp/dirwatcher-f06c15", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
open("/tmp/dirwatcher-f06c15/watch", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC)
= 4
[pid 49990] open("/tmp/dirwatcher-3204e2/watch",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6
open("/tmp/dirwatcher-3204e2", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
open("/tmp/dirwatcher-3204e2/watch", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC)
= 4
[pid 49990] open("/tmp/dirwatcher-390f43/watch",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6
[pid 49990] open("/tmp/dirwatcher-390f43/watch",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 7
open("/tmp/dirwatcher-390f43", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
[pid 49990] open("/tmp/dirwatcher-34f29d/watch",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6
[pid 49990] open("/tmp/dirwatcher-34f29d",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
open("/tmp/dirwatcher-34f29d/watch", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC)
= 4
[pid 50018] open("/tmp/dirwatcher-f385d3/watch",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 7
[pid 49990] open("/tmp/dirwatcher-f385d3",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
open("/tmp/dirwatcher-f385d3/watch", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC)
= 4
```
However, the same test on an XC "compute node" hangs:
```
nid00049:DirectoryWatcher> ./DirectoryWatcherTests
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from DirectoryWatcherTest
[ RUN ] DirectoryWatcherTest.InitialScanSync
[ OK ] DirectoryWatcherTest.InitialScanSync (5 ms)
[ RUN ] DirectoryWatcherTest.InitialScanAsync
[ OK ] DirectoryWatcherTest.InitialScanAsync (8 ms)
[ RUN ] DirectoryWatcherTest.AddFiles
[ OK ] DirectoryWatcherTest.AddFiles (8 ms)
[ RUN ] DirectoryWatcherTest.ModifyFile
[ OK ] DirectoryWatcherTest.ModifyFile (0 ms)
[ RUN ] DirectoryWatcherTest.DeleteFile
[ OK ] DirectoryWatcherTest.DeleteFile (16 ms)
[ RUN ] DirectoryWatcherTest.DeleteWatchedDir
/path/to/llvm-project/clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:247:
Failure
Value of: WaitForExpectedStateResult.wait_for(std::chrono::seconds(3)) ==
std::future_status::ready
Actual: false
Expected: true
The expected result state wasn't reached before the time-out.
/path/to/llvm-project/clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:250:
Failure
Value of: TestConsumer.result().hasValue()
Actual: false
Expected: true
Expected initial events:
Expected non-initial events:
WatchedDirRemoved
WatcherGotInvalidated
Expected but not seen non-initial events:
WatchedDirRemoved
WatcherGotInvalidated
```
The program hangs on that last line and never returns. lldb shows the
following:
```
nid00049:DirectoryWatcher> lldb ./DirectoryWatcherTests
(lldb) target create "./DirectoryWatcherTests"
Current executable set to './DirectoryWatcherTests' (x86_64).
(lldb) run
Process 19658 launched:
'/path/to/build/tools/clang/unittests/DirectoryWatcher/DirectoryWatcherTests'
(x86_64)
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from DirectoryWatcherTest
[ RUN ] DirectoryWatcherTest.InitialScanSync
[ OK ] DirectoryWatcherTest.InitialScanSync (8 ms)
[ RUN ] DirectoryWatcherTest.InitialScanAsync
[ OK ] DirectoryWatcherTest.InitialScanAsync (16 ms)
[ RUN ] DirectoryWatcherTest.AddFiles
[ OK ] DirectoryWatcherTest.AddFiles (16 ms)
[ RUN ] DirectoryWatcherTest.ModifyFile
[ OK ] DirectoryWatcherTest.ModifyFile (16 ms)
[ RUN ] DirectoryWatcherTest.DeleteFile
[ OK ] DirectoryWatcherTest.DeleteFile (16 ms)
[ RUN ] DirectoryWatcherTest.DeleteWatchedDir
/path/to/llvm-project/clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:247:
Failure
Value of: WaitForExpectedStateResult.wait_for(std::chrono::seconds(3)) ==
std::future_status::ready
Actual: false
Expected: true
The expected result state wasn't reached before the time-out.
/path/to/llvm-project/clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:250:
Failure
Value of: TestConsumer.result().hasValue()
Actual: false
Expected: true
Expected initial events:
Expected non-initial events:
WatchedDirRemoved
WatcherGotInvalidated
Expected but not seen non-initial events:
WatchedDirRemoved
WatcherGotInvalidated
Process 19658 stopped
* thread #1, name = 'DirectoryWatche', stop reason = signal SIGSTOP
frame #0: 0x00002aaaaace1339 libpthread.so.0`__pthread_cond_destroy + 105
libpthread.so.0`__pthread_cond_destroy:
-> 0x2aaaaace1339 <+105>: cmpq $-0x1000, %rax ; imm = 0xF000
0x2aaaaace133f <+111>: jbe 0x2aaaaace1320 ; <+80>
0x2aaaaace1341 <+113>: leal 0xb(%rax), %ecx
0x2aaaaace1344 <+116>: cmpl $0xb, %ecx
(lldb) bt
error: DirectoryWatcherTests :: Class '_Alloc_hider' has a base class
'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
>::allocator_type' which does not have a complete definition.
error: DirectoryWatcherTests :: Try compiling the source file with
-fstandalone-debug.
* thread #1, name = 'DirectoryWatche', stop reason = signal SIGSTOP
* frame #0: 0x00002aaaaace1339 libpthread.so.0`__pthread_cond_destroy + 105
frame #1: 0x000000000040b666 DirectoryWatcherTests`(anonymous
namespace)::VerifyingConsumer::~VerifyingConsumer(this=0x00007fffffff5f78) at
DirectoryWatcherTest.cpp:100:8
frame #2: 0x000000000040d62a
DirectoryWatcherTests`DirectoryWatcherTest_DeleteWatchedDir_Test::TestBody(this=0x000000000082f8e0)
at DirectoryWatcherTest.cpp:430:1
frame #3: 0x000000000054a634 DirectoryWatcherTests`void
testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test,
void>(object=0x000000000082f8e0, method=21 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00, location="the test body")(), char const*) at gtest.cc:2402:10
frame #4: 0x0000000000537a82 DirectoryWatcherTests`void
testing::internal::HandleExceptionsInMethodIfSupported<testing::Test,
void>(object=0x000000000082f8e0, method=21 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00, location="the test body")(), char const*) at gtest.cc:2455:12
frame #5: 0x0000000000525a26
DirectoryWatcherTests`testing::Test::Run(this=0x000000000082f8e0) at
gtest.cc:2474:5
frame #6: 0x00000000005262bb
DirectoryWatcherTests`testing::TestInfo::Run(this=0x000000000082dfc0) at
gtest.cc:2656:11
frame #7: 0x0000000000526854
DirectoryWatcherTests`testing::TestCase::Run(this=0x000000000082d660) at
gtest.cc:2774:28
frame #8: 0x000000000052be25
DirectoryWatcherTests`testing::internal::UnitTestImpl::RunAllTests(this=0x000000000082d1e0)
at gtest.cc:4649:43
frame #9: 0x000000000054dc34 DirectoryWatcherTests`bool
testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
bool>(object=0x000000000082d1e0, method=50 bb 52 00 00 00 00 00 00 00 00 00 00
00 00 00, location="auxiliary test code (environments or event listeners)")(),
char const*) at gtest.cc:2402:10
frame #10: 0x00000000005396e2 DirectoryWatcherTests`bool
testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
bool>(object=0x000000000082d1e0, method=50 bb 52 00 00 00 00 00 00 00 00 00 00
00 00 00, location="auxiliary test code (environments or event listeners)")(),
char const*) at gtest.cc:2455:12
frame #11: 0x000000000052bb2a
DirectoryWatcherTests`testing::UnitTest::Run(this=0x000000000081a218) at
gtest.cc:4257:10
frame #12: 0x000000000051df31 DirectoryWatcherTests`RUN_ALL_TESTS() at
gtest.h:2233:46
frame #13: 0x000000000051df1a DirectoryWatcherTests`main(argc=1,
argv=0x00007fffffff6608) at TestMain.cpp:50:10
frame #14: 0x00002aaaabc5334a libc.so.6`__libc_start_main + 234
frame #15: 0x000000000040a54a DirectoryWatcherTests`_start at start.S:120
```
I have been told that the /tmp ramdisk on "compute nodes" is configured
slightly different than the /tmp ramdisk on "login nodes," which may explain
the difference in behavior. But I don't know what the difference in
configuration actually is.
I also tried setting TMPDIR to different file systems, including Lustre and
GPFS. I found that when TMPDIR is a GPFS file system, the test succeeds on any
kind of node (login and compute), but when TMPDIR is a Lustre filesystem, it
hangs on both login nodes and compute nodes, in the same way as when TMPDIR is
set to the /tmp ramdisk.
So to summarize:
/tmp ramdisk on XC login nodes: PASS
Lustre on XC login nodes: (hangs)
GPFS on XC login nodes: PASS
/tmp ramdisk on XC compute nodes: (hangs)
Lustre on XC compute nodes: (hangs)
GPFS on XC compute nodes: PASS
Any ideas how to make this test less sensitive to the kind of file system it's
running on?
Thanks.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200723/04b63341/attachment-0001.html>
More information about the llvm-bugs
mailing list