[Lldb-commits] [PATCH] D158583: Fix shared library loading when users define duplicate _r_debug structure.

Greg Clayton via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Tue Aug 22 23:43:47 PDT 2023


clayborg created this revision.
clayborg added reviewers: labath, JDevlieghere, GeorgeHuyubo, yinghuitan, kusmour.
Herald added a project: All.
clayborg requested review of this revision.
Herald added a project: LLDB.
Herald added a subscriber: lldb-commits.

We ran into a case where shared libraries would fail to load in some processes on linux. The issue turned out to be if the main executable or a shared library defined a symbol named "_r_debug", then it would cause problems once the executable that contained it was loaded into the process. The "_r_debug" structure is currently found by looking through the .dynamic section in the main executable and finding the DT_DEBUG entry which points to this structure. The dynamic loader will update this structure as shared libraries are loaded and LLDB watches the contents of this structure as the dyld breakpoint is hit. Currently we expect the "state" in this structure to change as things happen. An issue comes up if someone defines another "_r_debug" struct in their program:

  r_debug _r_debug;

If this code is included, a new "_r_debug" structure is created and it causes problems once the executable is loaded. This is because of the way symbol lookups happen in linux: they use the shared library list in the order it created and the dynamic loader is always last. So at some point the dynamic loader will start updating this other copy of "_r_debug", yet LLDB is only watching the copy inside of the dynamic loader.

Steps that show the problem are:

- lldb finds the "_r_debug" structure via the DT_DEBUG entry in the .dynamic section and this points to the "_r_debug" in ld.so
- ld.so modifies its copy of "_r_debug" with "state = eAdd" before it loads the shared libraries and calls the dyld function that LLDB has set a breakpoint on and we find this state and do nothing (we are waiting for a state of eConsistent to tell us the shared libraries have been fully loaded)
- ld.so loads the main executable and any dependent shared libraries and wants to update the "_r_debug" structure, but it now finds "_r_debug" in the a.out program and updates the state in this other copy
- lldb hits the notification breakpoint and checks the ld.so copy of "_r_debug" which still has a state of "eAdd". LLDB wants the new "eConsistent" state which will trigger the shared libraries to load, but it gets stale data and doesn't do anyhing and library load is missed. The "_r_debug" in a.out has the state set correctly, but we don't know which "_r_debug" is the right one.

The new fix detects the two "eAdd" states and loads shared libraries and will emit a log message in the "log enable lldb dyld" log channel which states there might be multiple "_r_debug" structs.

The correct solution is that no one should be adding a duplicate "_r_debug" symbol to their binaries, but we have programs that are doing this already and since it can be done, we should be able to work with this and keep debug sessions working as expected. If a user #includes the <link.h> file, they can just use the existing "_r_debug" structure as it is defined in this header file as "extern struct r_debug _r_debug;" and no local copies need to be made.

If your ld.so has debug info, you can easily see the duplicate "_r_debug" structs by doing:

  (lldb) target variable _r_debug --raw
  (r_debug) _r_debug = {
    r_version = 1
    r_map = 0x00007ffff7e30210
    r_brk = 140737349972416
    r_state = RT_CONSISTENT
    r_ldbase = 0
  }
  (r_debug) _r_debug = {
    r_version = 1
    r_map = 0x00007ffff7e30210
    r_brk = 140737349972416
    r_state = RT_ADD
    r_ldbase = 140737349943296
  }
  (lldb) target variable &_r_debug
  (r_debug *) &_r_debug = 0x0000555555601040
  (r_debug *) &_r_debug = 0x00007ffff7e301e0

And if you do a "image lookup --address <addr>" in the addresses, you can see one is in the a.out and one in the ld.so.

Adding more logging to print out the m_previous and m_current Rendezvous structures to make things more clear. Also added a log when we detect multiple eAdd states in a row to detect this problem in logs.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D158583

Files:
  lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp
  lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h
  lldb/test/API/functionalities/dyld-multiple-rdebug/Makefile
  lldb/test/API/functionalities/dyld-multiple-rdebug/TestDyldWithMultupleRDebug.py
  lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.cpp
  lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.h
  lldb/test/API/functionalities/dyld-multiple-rdebug/main.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D158583.552606.patch
Type: text/x-patch
Size: 10765 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20230823/16058fbd/attachment-0001.bin>


More information about the lldb-commits mailing list