[Lldb-commits] [lldb] 07c215e - Fix shared library loading when users define duplicate _r_debug structure.

Greg Clayton via lldb-commits lldb-commits at lists.llvm.org
Thu Aug 31 10:37:30 PDT 2023


Author: Greg Clayton
Date: 2023-08-31T10:37:20-07:00
New Revision: 07c215e8a8af54d0084af7291ac29fef3672fcd8

URL: https://github.com/llvm/llvm-project/commit/07c215e8a8af54d0084af7291ac29fef3672fcd8
DIFF: https://github.com/llvm/llvm-project/commit/07c215e8a8af54d0084af7291ac29fef3672fcd8.diff

LOG: Fix shared library loading when users define duplicate _r_debug structure.

We ran into a case where shared libraries would fail to load in some processes on linux. The issue turned out to be if the main executable or a shared library defined a symbol named "_r_debug", then it would cause problems once the executable that contained it was loaded into the process. The "_r_debug" structure is currently found by looking through the .dynamic section in the main executable and finding the DT_DEBUG entry which points to this structure. The dynamic loader will update this structure as shared libraries are loaded and LLDB watches the contents of this structure as the dyld breakpoint is hit. Currently we expect the "state" in this structure to change as things happen. An issue comes up if someone defines another "_r_debug" struct in their program:
```
r_debug _r_debug;
```
If this code is included, a new "_r_debug" structure is created and it causes problems once the executable is loaded. This is because of the way symbol lookups happen in linux: they use the shared library list in the order it created and the dynamic loader is always last. So at some point the dynamic loader will start updating this other copy of "_r_debug", yet LLDB is only watching the copy inside of the dynamic loader.

Steps that show the problem are:
- lldb finds the "_r_debug" structure via the DT_DEBUG entry in the .dynamic section and this points to the "_r_debug" in ld.so
- ld.so modifies its copy of "_r_debug" with "state = eAdd" before it loads the shared libraries and calls the dyld function that LLDB has set a breakpoint on and we find this state and do nothing (we are waiting for a state of eConsistent to tell us the shared libraries have been fully loaded)
- ld.so loads the main executable and any dependent shared libraries and wants to update the "_r_debug" structure, but it now finds "_r_debug" in the a.out program and updates the state in this other copy
- lldb hits the notification breakpoint and checks the ld.so copy of "_r_debug" which still has a state of "eAdd". LLDB wants the new "eConsistent" state which will trigger the shared libraries to load, but it gets stale data and doesn't do anyhing and library load is missed. The "_r_debug" in a.out has the state set correctly, but we don't know which "_r_debug" is the right one.

The new fix detects the two "eAdd" states and loads shared libraries and will emit a log message in the "log enable lldb dyld" log channel which states there might be multiple "_r_debug" structs.

The correct solution is that no one should be adding a duplicate "_r_debug" symbol to their binaries, but we have programs that are doing this already and since it can be done, we should be able to work with this and keep debug sessions working as expected. If a user #includes the <link.h> file, they can just use the existing "_r_debug" structure as it is defined in this header file as "extern struct r_debug _r_debug;" and no local copies need to be made.

If your ld.so has debug info, you can easily see the duplicate "_r_debug" structs by doing:
```
(lldb) target variable _r_debug --raw
(r_debug) _r_debug = {
  r_version = 1
  r_map = 0x00007ffff7e30210
  r_brk = 140737349972416
  r_state = RT_CONSISTENT
  r_ldbase = 0
}
(r_debug) _r_debug = {
  r_version = 1
  r_map = 0x00007ffff7e30210
  r_brk = 140737349972416
  r_state = RT_ADD
  r_ldbase = 140737349943296
}
(lldb) target variable &_r_debug
(r_debug *) &_r_debug = 0x0000555555601040
(r_debug *) &_r_debug = 0x00007ffff7e301e0
```
And if you do a "image lookup --address <addr>" in the addresses, you can see one is in the a.out and one in the ld.so.

Adding more logging to print out the m_previous and m_current Rendezvous structures to make things more clear. Also added a log when we detect multiple eAdd states in a row to detect this problem in logs.

Differential Revision: https://reviews.llvm.org/D158583

Added: 
    lldb/test/API/functionalities/dyld-multiple-rdebug/Makefile
    lldb/test/API/functionalities/dyld-multiple-rdebug/TestDyldWithMultupleRDebug.py
    lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.cpp
    lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.h
    lldb/test/API/functionalities/dyld-multiple-rdebug/main.cpp

Modified: 
    lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp
    lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h

Removed: 
    


################################################################################
diff  --git a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp
index f20167b46d270e..1ef284c4690629 100644
--- a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp
@@ -25,6 +25,32 @@
 using namespace lldb;
 using namespace lldb_private;
 
+const char *DYLDRendezvous::StateToCStr(RendezvousState state) {
+  switch (state) {
+    case DYLDRendezvous::eConsistent:
+      return "eConsistent";
+    case DYLDRendezvous::eAdd:
+      return "eAdd";
+    case DYLDRendezvous::eDelete:
+      return "eDelete";
+  }
+  return "<invalid RendezvousState>";
+}
+
+const char *DYLDRendezvous::ActionToCStr(RendezvousAction action) {
+  switch (action) {
+  case DYLDRendezvous::RendezvousAction::eTakeSnapshot:
+    return "eTakeSnapshot";
+  case DYLDRendezvous::RendezvousAction::eAddModules:
+    return "eAddModules";
+  case DYLDRendezvous::RendezvousAction::eRemoveModules:
+    return "eRemoveModules";
+  case DYLDRendezvous::RendezvousAction::eNoAction:
+    return "eNoAction";
+  }
+  return "<invalid RendezvousAction>";
+}
+
 DYLDRendezvous::DYLDRendezvous(Process *process)
     : m_process(process), m_rendezvous_addr(LLDB_INVALID_ADDRESS),
       m_executable_interpreter(false), m_current(), m_previous(),
@@ -129,6 +155,13 @@ void DYLDRendezvous::UpdateExecutablePath() {
   }
 }
 
+void DYLDRendezvous::Rendezvous::DumpToLog(Log *log, const char *label) {
+  LLDB_LOGF(log, "%s Rendezvous: version = %" PRIu64 ", map_addr = 0x%16.16"
+            PRIx64 ", brk = 0x%16.16" PRIx64 ", state = %" PRIu64
+            " (%s), ldbase = 0x%16.16" PRIx64, label ? label : "", version,
+            map_addr, brk, state, StateToCStr((RendezvousState)state), ldbase);
+}
+
 bool DYLDRendezvous::Resolve() {
   Log *log = GetLog(LLDBLog::DynamicLoader);
 
@@ -176,6 +209,9 @@ bool DYLDRendezvous::Resolve() {
   m_previous = m_current;
   m_current = info;
 
+  m_previous.DumpToLog(log, "m_previous");
+  m_current.DumpToLog(log, "m_current ");
+
   if (m_current.map_addr == 0)
     return false;
 
@@ -217,6 +253,75 @@ DYLDRendezvous::RendezvousAction DYLDRendezvous::GetAction() const {
     break;
 
   case eAdd:
+    // If the main executable or a shared library defines a publicly visible
+    // symbol named "_r_debug", then it will cause problems once the executable
+    // that contains the symbol is loaded into the process. The correct
+    // "_r_debug" structure is currently found by LLDB by looking through
+    // the .dynamic section in the main executable and finding the DT_DEBUG tag
+    // entry.
+    //
+    // An issue comes up if someone defines another publicly visible "_r_debug"
+    // struct in their program. Sample code looks like:
+    //
+    //    #include <link.h>
+    //    r_debug _r_debug;
+    //
+    // If code like this is in an executable or shared library, this creates a
+    // new "_r_debug" structure and it causes problems once the executable is
+    // loaded due to the way symbol lookups happen in linux: the shared library
+    // list from _r_debug.r_map will be searched for a symbol named "_r_debug"
+    // and the first match will be the new version that is used. The dynamic
+    // loader is always last in this list. So at some point the dynamic loader
+    // will start updating the copy of "_r_debug" that gets found first. The
+    // issue is that LLDB will only look at the copy that is pointed to by the
+    // DT_DEBUG entry, or the initial version from the ld.so binary.
+    //
+    // Steps that show the problem are:
+    //
+    // - LLDB finds the "_r_debug" structure via the DT_DEBUG entry in the
+    //   .dynamic section and this points to the "_r_debug" in ld.so
+    // - ld.so uodates its copy of "_r_debug" with "state = eAdd" before it
+    //   loads the dependent shared libraries for the main executable and
+    //   any dependencies of all shared libraries from the executable's list
+    //   and ld.so code calls the debugger notification function
+    //   that LLDB has set a breakpoint on.
+    // - LLDB hits the breakpoint and the breakpoint has a callback function
+    //   where we read the _r_debug.state (eAdd) state and we do nothing as the
+    //   "eAdd" state indicates that the shared libraries are about to be added.
+    // - ld.so finishes loading the main executable and any dependent shared
+    //   libraries and it will update the "_r_debug.state" member with a
+    //   "eConsistent", but it now updates the "_r_debug" in the a.out program
+    //   and it calls the debugger notification function.
+    // - lldb hits the notification breakpoint and checks the ld.so copy of
+    //   "_r_debug.state" which still has a state of "eAdd", but LLDB needs to see a
+    //   "eConsistent" state to trigger the shared libraries to get loaded into
+    //   the debug session, but LLDB the ld.so _r_debug.state which still
+    //   contains "eAdd" and doesn't do anyhing and library load is missed.
+    //   The "_r_debug" in a.out has the state set correctly to "eConsistent"
+    //   but LLDB is still looking at the "_r_debug" from ld.so.
+    //
+    // So if we detect two "eAdd" states in a row, we assume this is the issue
+    // and we now load shared libraries correctly and will emit a log message
+    // in the "log enable lldb dyld" log channel which states there might be
+    // multiple "_r_debug" structs causing problems.
+    //
+    // The correct solution is that no one should be adding a duplicate
+    // publicly visible "_r_debug" symbols to their binaries, but we have
+    // programs that are doing this already and since it can be done, we should
+    // be able to work with this and keep debug sessions working as expected.
+    //
+    // If a user includes the <link.h> file, they can just use the existing
+    // "_r_debug" structure as it is defined in this header file as "extern
+    // struct r_debug _r_debug;" and no local copies need to be made.
+    if (m_previous.state == eAdd) {
+      Log *log = GetLog(LLDBLog::DynamicLoader);
+      LLDB_LOG(log, "DYLDRendezvous::GetAction() found two eAdd states in a "
+               "row, check process for multiple \"_r_debug\" symbols. "
+               "Returning eAddModules to ensure shared libraries get loaded "
+               "correctly");
+      return eAddModules;
+    }
+    return eNoAction;
   case eDelete:
     return eNoAction;
   }
@@ -225,7 +330,9 @@ DYLDRendezvous::RendezvousAction DYLDRendezvous::GetAction() const {
 }
 
 bool DYLDRendezvous::UpdateSOEntriesFromRemote() {
-  auto action = GetAction();
+  const auto action = GetAction();
+  Log *log = GetLog(LLDBLog::DynamicLoader);
+  LLDB_LOG(log, "{0} action = {1}", __PRETTY_FUNCTION__, ActionToCStr(action));
 
   if (action == eNoAction)
     return false;
@@ -263,7 +370,10 @@ bool DYLDRendezvous::UpdateSOEntriesFromRemote() {
 bool DYLDRendezvous::UpdateSOEntries() {
   m_added_soentries.clear();
   m_removed_soentries.clear();
-  switch (GetAction()) {
+  const auto action = GetAction();
+  Log *log = GetLog(LLDBLog::DynamicLoader);
+  LLDB_LOG(log, "{0} action = {1}", __PRETTY_FUNCTION__, ActionToCStr(action));
+  switch (action) {
   case eTakeSnapshot:
     m_soentries.clear();
     return TakeSnapshot(m_soentries);

diff  --git a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h
index fc1dd6921455be..b8bdf78fbdfada 100644
--- a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h
@@ -21,6 +21,7 @@
 using lldb_private::LoadedModuleInfoList;
 
 namespace lldb_private {
+class Log;
 class Process;
 }
 
@@ -28,9 +29,81 @@ class Process;
 /// Interface to the runtime linker.
 ///
 /// A structure is present in a processes memory space which is updated by the
-/// runtime liker each time a module is loaded or unloaded.  This class
+/// dynamic linker each time a module is loaded or unloaded.  This class
 /// provides an interface to this structure and maintains a consistent
 /// snapshot of the currently loaded modules.
+///
+/// In the dynamic loader sources, this structure has a type of "r_debug" and
+/// the name of the structure us "_r_debug". The structure looks like:
+///
+/// struct r_debug {
+///     // Version number for this protocol.
+///     int r_version;
+///     // Head of the chain of loaded objects.
+///     struct link_map *r_map;
+///     // The address the debugger should set a breakpoint at in order to get
+///     // notified when shared libraries are added or removed
+///     uintptr_t r_brk;
+///     // This state value describes the mapping change taking place when the
+///     // 'r_brk' address is called.
+///     enum {
+///       RT_CONSISTENT, // Mapping change is complete.
+///       RT_ADD,        // Beginning to add a new object.
+///       RT_DELETE,     // Beginning to remove an object mapping.
+///     } r_state;
+///     // Base address the linker is loaded at.
+///     uintptr_t r_ldbase;
+///   };
+///
+/// The dynamic linker then defines a global variable using this type named
+/// "_r_debug":
+///
+///   r_debug _r_debug;
+///
+/// The DYLDRendezvous class defines a local version of this structure named
+/// DYLDRendezvous::Rendezvous. See the definition inside the class definition
+/// for DYLDRendezvous.
+///
+/// This structure can be located by looking through the .dynamic section in
+/// the main executable and finding the DT_DEBUG tag entry. This value starts
+/// out with a value of zero when the program first is initially loaded, but
+/// the address of the "_r_debug" structure from ld.so is filled in by the
+/// dynamic loader during program initialization code in ld.so prior to loading
+/// or unloading and shared libraries.
+///
+/// The dynamic loader will update this structure as shared libraries are
+/// loaded and will call a specific function that LLDB knows to set a
+/// breakpoint on (from _r_debug.r_brk) so LLDB will find out when shared
+/// libraries are loaded or unloaded. Each time this breakpoint is hit, LLDB
+/// looks at the contents of this structure and the contents tell LLDB what
+/// needs to be done.
+///
+/// Currently we expect the "state" in this structure to change as things
+/// happen.
+///
+/// When any shared libraries are loaded the following happens:
+/// - _r_debug.r_map is updated with the new shared libraries. This is a
+///   doubly linked list of "link_map *" entries.
+/// - _r_debug.r_state is set to RT_ADD and the debugger notification
+///   function is called notifying the debugger that shared libraries are
+///   about to be added, but are not yet ready for use.
+/// - Once the the shared libraries are fully loaded, _r_debug.r_state is set
+///   to RT_CONSISTENT and the debugger notification function is called again
+///   notifying the debugger that shared libraries are ready for use.
+///   DYLDRendezvous must remember that the previous state was RT_ADD when it
+///   receives a RT_CONSISTENT in order to know to add libraries
+///
+/// When any shared libraries are unloaded the following happens:
+/// - _r_debug.r_map is updated and the unloaded libraries are removed.
+/// - _r_debug.r_state is set to RT_DELETE and the debugger notification
+///   function is called notifying the debugger that shared libraries are
+///   about to be removed.
+/// - Once the the shared libraries are removed _r_debug.r_state is set to
+///   RT_CONSISTENT and the debugger notification function is called again
+///   notifying the debugger that shared libraries have been removed.
+///   DYLDRendezvous must remember that the previous state was RT_DELETE when
+///   it receives a RT_CONSISTENT in order to know to remove libraries
+///
 class DYLDRendezvous {
 
   // This structure is used to hold the contents of the debug rendezvous
@@ -45,6 +118,8 @@ class DYLDRendezvous {
     lldb::addr_t ldbase = 0;
 
     Rendezvous() = default;
+
+    void DumpToLog(lldb_private::Log *log, const char *label);
   };
 
   /// Locates the address of the rendezvous structure.  It updates
@@ -126,8 +201,15 @@ class DYLDRendezvous {
 
   /// Constants describing the state of the rendezvous.
   ///
+  /// These values are defined to match the r_debug.r_state enum from the
+  /// actual dynamic loader sources.
+  ///
   /// \see GetState().
-  enum RendezvousState { eConsistent, eAdd, eDelete };
+  enum RendezvousState {
+    eConsistent, // RT_CONSISTENT
+    eAdd,        // RT_ADD
+    eDelete      // RT_DELETE
+  };
 
   /// Structure representing the shared objects currently loaded into the
   /// inferior process.
@@ -276,6 +358,9 @@ class DYLDRendezvous {
     eRemoveModules
   };
 
+  static const char *StateToCStr(RendezvousState state);
+  static const char *ActionToCStr(RendezvousAction action);
+
   /// Returns the current action to be taken given the current and previous
   /// state
   RendezvousAction GetAction() const;

diff  --git a/lldb/test/API/functionalities/dyld-multiple-rdebug/Makefile b/lldb/test/API/functionalities/dyld-multiple-rdebug/Makefile
new file mode 100644
index 00000000000000..9b462acb83633d
--- /dev/null
+++ b/lldb/test/API/functionalities/dyld-multiple-rdebug/Makefile
@@ -0,0 +1,4 @@
+CXX_SOURCES := main.cpp
+DYLIB_NAME := testlib
+DYLIB_CXX_SOURCES := library_file.cpp
+include Makefile.rules

diff  --git a/lldb/test/API/functionalities/dyld-multiple-rdebug/TestDyldWithMultupleRDebug.py b/lldb/test/API/functionalities/dyld-multiple-rdebug/TestDyldWithMultupleRDebug.py
new file mode 100644
index 00000000000000..11620fe7c8b07c
--- /dev/null
+++ b/lldb/test/API/functionalities/dyld-multiple-rdebug/TestDyldWithMultupleRDebug.py
@@ -0,0 +1,39 @@
+"""
+Test that LLDB can launch a linux executable through the dynamic loader where
+the main executable has an extra exported "_r_debug" symbol that used to mess
+up shared library loading with DYLDRendezvous and the POSIX dynamic loader
+plug-in. What used to happen is that any shared libraries other than the main
+executable and the dynamic loader and VSDO would not get loaded. This test
+checks to make sure that we still load libraries correctly when we have
+multiple "_r_debug" symbols. See comments in the main.cpp source file for full
+details on what the problem is.
+"""
+
+import lldb
+import os
+
+from lldbsuite.test import lldbutil
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+
+
+class TestDyldWithMultipleRDebug(TestBase):
+    @skipIf(oslist=no_match(["linux"]))
+    @no_debug_info_test
+    def test(self):
+        self.build()
+        # Run to a breakpoint in main.cpp to ensure we can hit breakpoints
+        # in the main executable. Setting breakpoints by file and line ensures
+        # that the main executable was loaded correctly by the dynamic loader
+        (target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
+            self, "// Break here", lldb.SBFileSpec("main.cpp"),
+            extra_images=["testlib"]
+        )
+        # Set breakpoints both on shared library function to ensure that
+        # we hit a source breakpoint in the shared library which only will
+        # happen if we load the shared library correctly in the dynamic
+        # loader.
+        lldbutil.continue_to_source_breakpoint(
+            self, process, "// Library break here",
+            lldb.SBFileSpec("library_file.cpp", False)
+        )

diff  --git a/lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.cpp b/lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.cpp
new file mode 100644
index 00000000000000..436c302bda4099
--- /dev/null
+++ b/lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.cpp
@@ -0,0 +1,7 @@
+#include "library_file.h"
+#include <stdio.h>
+
+int library_function(void) {
+  puts(__FUNCTION__); // Library break here
+  return 0;
+}

diff  --git a/lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.h b/lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.h
new file mode 100644
index 00000000000000..be402d7f4c1e65
--- /dev/null
+++ b/lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.h
@@ -0,0 +1 @@
+int library_function();

diff  --git a/lldb/test/API/functionalities/dyld-multiple-rdebug/main.cpp b/lldb/test/API/functionalities/dyld-multiple-rdebug/main.cpp
new file mode 100644
index 00000000000000..ead5dafd2c67d7
--- /dev/null
+++ b/lldb/test/API/functionalities/dyld-multiple-rdebug/main.cpp
@@ -0,0 +1,32 @@
+#include "library_file.h"
+#include <link.h>
+#include <stdio.h>
+// Make a duplicate "_r_debug" symbol that is visible. This is the global
+// variable name that the dynamic loader uses to communicate changes in shared
+// libraries that get loaded and unloaded. LLDB finds the address of this
+// variable by reading the DT_DEBUG entry from the .dynamic section of the main
+// executable.
+// What will happen is the dynamic loader will use the "_r_debug" symbol from
+// itself until the a.out executable gets loaded. At this point the new
+// "_r_debug" symbol will take precedence over the orignal "_r_debug" symbol
+// from the dynamic loader and the copy below will get updated with shared
+// library state changes while the version that LLDB checks in the dynamic
+// loader stays the same for ever after this.
+//
+// When our DYLDRendezvous.cpp tries to check the state in the _r_debug
+// structure, it will continue to get the last eAdd as the state before the
+// switch in symbol resolution.
+//
+// Before a fix in LLDB, this would mean that we wouldn't ever load any shared
+// libraries since DYLDRendezvous was waiting to see a eAdd state followed by a
+// eConsistent state which would trigger the adding of shared libraries, but we
+// would never see this change because the local copy below is actually what
+// would get updated. Now if DYLDRendezvous detects two eAdd states in a row,
+// it will load the shared libraries instead of doing nothing and a log message
+// will be printed out if "log enable lldb dyld" is active.
+r_debug _r_debug;
+
+int main() {
+  library_function(); // Break here
+  return 0;
+}


        


More information about the lldb-commits mailing list