[llvm-bugs] [Bug 51930] New: [LLDB] Most tests fail on Windows when built for debug

Tue Sep 21 13:04:44 PDT 2021

https://bugs.llvm.org/show_bug.cgi?id=51930

            Bug ID: 51930
           Summary: [LLDB] Most tests fail on Windows when built for debug
           Product: lldb
           Version: unspecified
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: normal
          Priority: P
         Component: All Bugs
          Assignee: lldb-dev at lists.llvm.org
          Reporter: amccarth at google.com
                CC: jdevlieghere at apple.com, llvm-bugs at lists.llvm.org

A variety of configuration and design decisions have resulted in a situation
that, if you build for debug (i.e., `-DCMAKE_BUILD_TYPE=Debug`) and try to
execute the tests on Windows, most of them will fail.

## Steps to Reproduce
```
cmake -GNinja -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_PROJECTS="clang;lld;lldb"
-DLLVM_TARGETS_TO_BUILD=X86 ..\..\llvm-project\llvm

ninja check-lldb
```

# Result

Over 900 of the tests will fail.

Many tests will give a Python stack trace with an access violation or this
error:

> ImportError: cannot import name '_lldb' from partially initialized module 'lldb' (most likely due to a circular import) (D:\src\llvm\build\ninja_dbg\Lib\site-packages\lldb\__init__.py)

(These are actually the best clue as to what's going on.)

For many more tests, the generated output is wrong, so apparent symptom is a
filecheck miscompare.  (If you try to reproduce these outside of lit, the
output will likely be correct and the test will pass.  See Workaround for
details.)

A few tests hang.

# Cause

Two different versions of the Python in the same process, each of which is
built against a different version of the C run-time library DLLs.

Here's how this happens:

1.  Ninja starts a process running Lit in the regular Python interpreter.

2.  Lit starts a process to run dotest in Python interpreter.

3.  The dotest.py script imports the lldb module.

4.  The lldb module's SWIG-generated `__init__` in turn tries to import _lldb.

5.  In release builds _lldb is a Windows DLL called _lldb.pyd produced from the
SWIG bindings.  In debug builds, the DLL is called _lldb_d.pyd.

    After SWIG 3.0.9, the template to generate the lldb module's `__init__` had
to change a bit (because newer versions of SWIG required changes).  As a
result, the `__init__` method no longer distinguishes between _lldb and
_lldb_d.

    Our CMake builds originally adapted by creating a filesystem link from
_lldb.pyd to the actual _lldb.pyd or _lldb_d.pyd as appropriate.  This didn't
work reliably (possibly because of differences in the implementations of
symlink from GnuWin32 and git).

    Nowadays the correct DLL is copied to _lldb.pyd.  Note, however, that the
copy and silently fail.  See Notes.

6.  Using the now-loaded lldb module to get the SBAPI, dotest.py creates and
instance of LLDB (the actual debugger), which runs in the same process as
dotest.py.

7.  That LLDB instance has its own statically-linked Python interpreter
embedded.  Thus the process now has two instances of Python:  one running
dotest.py and one inside the LLDB instance.

If those two instances don't match, e.g., if one is "release" and the other is
"debug", or one is 3.7 and the other 3.8, misery ensures.

# Workaround

You can exercise the tests with a "release" build, but you will miss some bugs
because release builds disable assertions in core llvm libraries.

For individual dotest.py tests, you can bypass Ninja and Lit and explicitly
launch dotest.py in the debug version of Python (i.e., `python_d.exe` instead
of `python.exe`).  For example:

```
"C:/Program Files/Python38/python_d.exe" \
  D:/src/llvm/llvm-project/lldb\test\API\dotest.py \
  [options elided] \
  -p TestDynamicValue.py
```

## Solution

None found.  I recommend we modify our CMake scripts to warn when
CMAKE_BUILD_TYPE is Debug, the target platform is Windows, and
LLVM_ENABLE_PROJECTS includes lldb.

## Notes

### Failure to Copy

The copy of either _lldb.pyd or _lldb_d.pyd to _lldb.pyd can fail.  In
particular, I've seen this happen when a zombie process from a previous test
run holds the older file locked.  For reasons I haven't discovered, failure of
the copy doesn't fail the build.  You're left with a previous build of
_lldb.pyd, which can make for difficult-to-debug problems.

### Python Detection Churn

In the past year or two, we've had a lot of churn in how CMake finds Python for
llvm generally and for specifically for lldb.  In hindsight, I think a lot of
the problems I experienced with those changes were because the test process
ended up with two different versions of Python, each linked against a different
version of the CRT, even when they were both release builds.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210921/9c211ca8/attachment-0001.html>