[all-commits] [llvm/llvm-project] 07c215: Fix shared library loading when users define dupli...

Greg Clayton via All-commits all-commits at lists.llvm.org
Thu Aug 31 10:37:40 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 07c215e8a8af54d0084af7291ac29fef3672fcd8
      https://github.com/llvm/llvm-project/commit/07c215e8a8af54d0084af7291ac29fef3672fcd8
  Author: Greg Clayton <gclayton at fb.com>
  Date:   2023-08-31 (Thu, 31 Aug 2023)

  Changed paths:
    M lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp
    M lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h
    A lldb/test/API/functionalities/dyld-multiple-rdebug/Makefile
    A lldb/test/API/functionalities/dyld-multiple-rdebug/TestDyldWithMultupleRDebug.py
    A lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.cpp
    A lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.h
    A lldb/test/API/functionalities/dyld-multiple-rdebug/main.cpp

  Log Message:
  -----------
  Fix shared library loading when users define duplicate _r_debug structure.

We ran into a case where shared libraries would fail to load in some processes on linux. The issue turned out to be if the main executable or a shared library defined a symbol named "_r_debug", then it would cause problems once the executable that contained it was loaded into the process. The "_r_debug" structure is currently found by looking through the .dynamic section in the main executable and finding the DT_DEBUG entry which points to this structure. The dynamic loader will update this structure as shared libraries are loaded and LLDB watches the contents of this structure as the dyld breakpoint is hit. Currently we expect the "state" in this structure to change as things happen. An issue comes up if someone defines another "_r_debug" struct in their program:
```
r_debug _r_debug;
```
If this code is included, a new "_r_debug" structure is created and it causes problems once the executable is loaded. This is because of the way symbol lookups happen in linux: they use the shared library list in the order it created and the dynamic loader is always last. So at some point the dynamic loader will start updating this other copy of "_r_debug", yet LLDB is only watching the copy inside of the dynamic loader.

Steps that show the problem are:
- lldb finds the "_r_debug" structure via the DT_DEBUG entry in the .dynamic section and this points to the "_r_debug" in ld.so
- ld.so modifies its copy of "_r_debug" with "state = eAdd" before it loads the shared libraries and calls the dyld function that LLDB has set a breakpoint on and we find this state and do nothing (we are waiting for a state of eConsistent to tell us the shared libraries have been fully loaded)
- ld.so loads the main executable and any dependent shared libraries and wants to update the "_r_debug" structure, but it now finds "_r_debug" in the a.out program and updates the state in this other copy
- lldb hits the notification breakpoint and checks the ld.so copy of "_r_debug" which still has a state of "eAdd". LLDB wants the new "eConsistent" state which will trigger the shared libraries to load, but it gets stale data and doesn't do anyhing and library load is missed. The "_r_debug" in a.out has the state set correctly, but we don't know which "_r_debug" is the right one.

The new fix detects the two "eAdd" states and loads shared libraries and will emit a log message in the "log enable lldb dyld" log channel which states there might be multiple "_r_debug" structs.

The correct solution is that no one should be adding a duplicate "_r_debug" symbol to their binaries, but we have programs that are doing this already and since it can be done, we should be able to work with this and keep debug sessions working as expected. If a user #includes the <link.h> file, they can just use the existing "_r_debug" structure as it is defined in this header file as "extern struct r_debug _r_debug;" and no local copies need to be made.

If your ld.so has debug info, you can easily see the duplicate "_r_debug" structs by doing:
```
(lldb) target variable _r_debug --raw
(r_debug) _r_debug = {
  r_version = 1
  r_map = 0x00007ffff7e30210
  r_brk = 140737349972416
  r_state = RT_CONSISTENT
  r_ldbase = 0
}
(r_debug) _r_debug = {
  r_version = 1
  r_map = 0x00007ffff7e30210
  r_brk = 140737349972416
  r_state = RT_ADD
  r_ldbase = 140737349943296
}
(lldb) target variable &_r_debug
(r_debug *) &_r_debug = 0x0000555555601040
(r_debug *) &_r_debug = 0x00007ffff7e301e0
```
And if you do a "image lookup --address <addr>" in the addresses, you can see one is in the a.out and one in the ld.so.

Adding more logging to print out the m_previous and m_current Rendezvous structures to make things more clear. Also added a log when we detect multiple eAdd states in a row to detect this problem in logs.

Differential Revision: https://reviews.llvm.org/D158583




More information about the All-commits mailing list