[lldb-dev] [llvm-dev] lldb stops on every call to dlopen

Steve Ravet via lldb-dev lldb-dev at lists.llvm.org
Wed Apr 18 07:31:23 PDT 2018


Hi Pavel, Eugene, here is a testcase that demonstrates the problem.  The code that sets up sigalrm in main() is the same as the program I have been debugging except that I’ve reduced the interval in this testcase.  In the original, as per the comment, the interval is .8 seconds.

When I run this under macOS lldb it runs to the end and exits normally without stopping.  If I run it under Linux it stops 5 or 6 times with "stop reason = trace”

I compile with

clang test.c -ldl

on both platforms.

cheers,
--steve

#include <dlfcn.h>
#include <signal.h>
#include <stdio.h>
#include <sys/time.h>

static int alarms=0;

void statistics_sigalarm_handler (int x)
{
  alarms++;
}

void loadso(char *path)
{
  void *handlep;
  handlep = dlopen(path, RTLD_LAZY | RTLD_LOCAL);
  if (handlep == NULL) {
    printf("dlopen failed %s\n", dlerror());
  }
}

int main()
{
  signal(SIGALRM, statistics_sigalarm_handler);

  struct itimerval itimerval;
  struct timeval timeval;
  timeval.tv_sec = 0;
  timeval.tv_usec = 10000;  // 8 tenths of a second

  itimerval.it_interval = timeval;
  itimerval.it_value = timeval;
  int r = setitimer(ITIMER_REAL, &itimerval, 0);
  if (r) perror("setitimer");

  for (int i=0; i<=100; i++) {
#ifdef __APPLE__
    loadso(“some_macos_library.dylib");
    loadso(“some_other_macos_library.dylib");
#else
    loadso(“some_linux_library.so");
    loadso(“some_other_linux_library.so");
#endif
  }
  signal(SIGALRM, SIG_DFL);
  printf("Got %d signals\n", alarms);
}




> On Apr 18, 2018, at 6:14 AM, Pavel Labath <labath at google.com> wrote:
> 
> +Eugene, who wrote the signal-skipping code
> 
> Cool, I'm glad we got to the bottom of this.
> 
> At this point, I *think* we should be able to reproduce the issue
> ourselves, but a self-contained reproducer would definitely make it easier
> for us and would be appreciated, particularly as there are still a few
> things I don't understand here (e.g., I think this should only matter only
> after an instruction-step, but your last log seems to show it happens after
> a "continue").
> 
> In any case, I do have a couple of follow-up questions:
> - How are you sending the SIGALRM and with what frequency?
> - Are you using SIGALRM on macOS as well (implying we handle it correctly
> there)?
> 
> Also, you may know this already, but I feel obligated to point out that
> setting "--pass false" will change the behavior of your program.
> 
> cheers,
> pavel
> On Tue, 17 Apr 2018 at 20:59, Steve Ravet <steve.ravet at apple.com> wrote:
> 
>> Ding!  That’s it.  This program does use SIGALRM.  I wouldn’t have
> thought it would be enabled at this point, but apparently it is because I
> got lots of sigalarm halts during the .so loading.
> 
>> Further, if I run in lldb with
> 
>> process handle -n false -p false -s false SIGALRM
> 
>> then the debugger seems to run fine without stopping during dlopen().
> The SIGALRM itself isn’t important to the operation of the program.
> 
>> With this knowledge I can probably create a simple testcase.  Would this
> be considered an lldb bug?  If so should I submit a testcase in some way?
> 
>> thanks!
>> --steve
> 
>>  The lover of nature is he whose inward and outward senses are still
> truly adjusted to each other; who has retained the spirit of infancy even
> into the era of manhood.
> 
> 
> 
>> On Apr 17, 2018, at 1:12 PM, Pavel Labath <labath at google.com> wrote:
> 
>> It's a bit of a wild guess, but is it possible that you (or one of the
>> libraries you use) are doing anything with signals (SIGALRM or such?). I
>> think I remember looking at the code handling the server-side ignored
>> signal handling and thinking that it could go wrong if you get a signal
>> while doing a instruction-step. I am not sure it fully applies here as the
>> last command that lldb client did was a "continue", but i think it has to
>> have something to do with signals, as you end up stopped in a signal
>> handler.
> 
>> Could you try the following sequence of commands?
>> (lldb) process launch --stop-at-entry-point
>> (lldb) process handle --notify true --stop true #Stop on all signals
>> (lldb) continue
> 
>> and let us know if you see any extra stops due to signals. If that doesn't
>> find anything then I think we'll have start pulling logs from the
>> lldb-server side, as there doesn't seem to be anything wrong with the
>> client. The easiest way to achieve that is to do a
>> export LLDB_SERVER_LOG_CHANNELS="posix all:gdb-remote packets"
>> export LLDB_DEBUGSERVER_LOG_FILE=/tmp/server.log
>> before launching lldb.
>> On Tue, 17 Apr 2018 at 18:28, Steve Ravet <steve.ravet at apple.com> wrote:
> 
>> Pavel asked for a dump of gdb-remote commands.  I got that and ran it
> 
>> through the gdbremote decoder, and trimmed to include what looks like the
>> last successful continue after breakpoint and then the halt on dlopen.
>> Both cases stop on signal 5.
> 
>> After the stop message the debugger issues two binary reads and then
> 
>> apparently makes the decision that it should stop rather than continue.
>> The stopping case is missing the equivalent of "Element 1: Single stepping
>> past breakpoint site 2 at 0x2aaaaaab9eb0” which is in the continuing case.
>> I’ve attached the file here:
> 
> 
>> thanks,
>> --steve
> 
> 
> 
>>  From the ashes of disaster grow the roses of success.
> 
> 
> 
> 
>> On Apr 17, 2018, at 11:27 AM, Jim Ingham <jingham at apple.com> wrote:
> 
> 
>> It is interesting that the stop reason on the thread that stopped is
> 
>> "trace".  That's what you would expect returning from the single-step to
>> step over the breakpoint.  But it looks like we got a signal while
>> single-stepping, but the stop reason was misreported by somebody.
> 
>> Jim
> 
> 
> 
>> On Apr 17, 2018, at 6:00 AM, Pavel Labath via lldb-dev <
> 
>> lldb-dev at lists.llvm.org> wrote:
> 
>> [+lldb-dev]
> 
> 
>> Hello Steve,
> 
> 
>> thanks for the report.
> 
> 
>> The fact that you see the rendezvous breakpoint being hit many times is
> 
>> not
> 
>> surprising. We get those every time the library is loaded (we need that to
>> load relevant debug info and set potential breakpoints). However, they
>> should generally not be surfaced to the user (unless you have the
>> stop-on-sharedlibrary-events setting set, which you don't).
> 
> 
>> The part that is suspicious to me is that __restore_rt shows up on the top
>> of the backtrace. This is a trampoline used to return from signal
> 
>> handlers,
> 
>> and it would seem to indicate that you got some sort of a signal while
>> loading the libraries. I don't know why this would happen, but it could be
>> that this is confusing lldb's auto-resume logic.
> 
> 
>> The interesting part to see here is what lldb thinks are the stop reasons
>> for individual threads in the process (is the process multi-threaded?) for
>> the last couple of stops. The "lldb step" and "gdb-remote packets" log
>> categories are the most interesting to observe here. If you are able to
>> send me the log traces, I can help you interpret them.
> 
> 
>> regards,
>> pavel
> 
> 
> 
> 
> 
> 
>> On Tue, 17 Apr 2018 at 02:27, Steve Ravet via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
> 
> 
>> Hello lldb developers, I am running into a problem with lldb on Linux.  I
> 
> 
>> am currently running llvm 6.0.0.
> 
> 
>> I have an executable that dynamically loads a large number of shared
> 
> 
>> libraries at runtime.  These are explicitly loaded via dlopen (they are
>> specified in a configuration file), and after loading a few (typically a
>> dozen or so, but the number varies) lldb will halt during dlopen.  If I
>> continue, it will load a few more then halt again, which makes debugging
>> from startup impractical since there are so many libraries to be loaded
>> (more than a hundred of them).
> 
> 
>> When I build and debug this same C++ on macOS, the debugger works fine.
> 
> 
>> I have verified that target.process.stop-on-sharedlibrary-events is false.
>> I turned on dyld logging and I see lots of log messages about
>> RendezvousBreakpoint being hit, but I don’t see anything that sheds light
>> on why some libraries load without stopping but others don’t.
> 
> 
>> I have tried to recreate this in a trivial program that calls dlopen in a
> 
> 
>> loop, but haven’t been able to reproduce.
> 
> 
>> Can your offer any suggestions for further debugging this?  More
> 
> 
>> supporting evidence follows.
> 
> 
>> Here is the message when the debugger stops:
> 
> 
> 
>> Process 120004 stopped
>> * thread #1, name = ‘xxxxxxxx', stop reason = trace
>>    frame #0: 0x00002aaaacfca6a0 libc.so.6`__restore_rt
>> libc.so.6`__restore_rt:
>> ->  0x2aaaacfca6a0 <+0>: movq   $0xf, %rax
>>    0x2aaaacfca6a7 <+7>: syscall
>>    0x2aaaacfca6a9 <+9>: nopl   (%rax)
> 
> 
> 
>> libc.so.6`__libc_sigaction:
>>    0x2aaaacfca6b0 <+0>: subq   $0xd0, %rsp
> 
> 
> 
>> I do not have the stop on shared library events setting enabled:
> 
> 
> 
>> (lldb) settings show target.process.stop-on-sharedlibrary-events
>> target.process.stop-on-sharedlibrary-events (boolean) = false
> 
> 
> 
> 
> 
>> The backtrace goes back to dlopen:
> 
> 
> 
>> (lldb) bt
>> * thread #1, name = ‘xxxxx', stop reason = trace
>>  * frame #0: 0x00002aaaacfca6a0 libc.so.6`__restore_rt
>>    frame #1: 0x00002aaaaaab9eb0 ld-linux-x86-64.so.2
>>    frame #2: 0x00002aaaaaabdc53 ld-linux-x86-64.so.2`dl_open_worker + 499
>>    frame #3: 0x00002aaaaaab9286 ld-linux-x86-64.so.2`_dl_catch_error +
> 
> 
>> 102
> 
> 
>>    frame #4: 0x00002aaaaaabd63a ld-linux-x86-64.so.2`_dl_open + 186
>>    frame #5: 0x00002aaaac39df66 libdl.so.2`dlopen_doit + 102
>>    frame #6: 0x00002aaaaaab9286 ld-linux-x86-64.so.2`_dl_catch_error +
> 
> 
>> 102
> 
> 
>>    frame #7: 0x00002aaaac39e29c libdl.so.2`_dlerror_run + 124
>>    frame #8: 0x00002aaaac39dee1 libdl.so.2`__dlopen_check + 49
> 
> 
> 
>> the dyld debug log has a lot of this:
>> 209 intern-state     DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit pid
> 
> 
>> 153501 stop_when_images_change=false
> 
> 
>> 210 intern-state     DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit
> 
> 
>> called for pid 153501
> 
> 
>> 211 intern-state     DYLDRendezvous::Resolve address size: 8, padding 4
>> 212 intern-state     DYLDRendezvous::Resolve cursor = 0x2aaaaaccc160
>> 213 intern-state     DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit pid
> 
> 
>> 153501 stop_when_images_change=false
> 
> 
>> 214 intern-state     DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit
> 
> 
>> called for pid 153501
> 
> 
>> 215 intern-state     DYLDRendezvous::Resolve address size: 8, padding 4
>> 216 intern-state     DYLDRendezvous::Resolve cursor = 0x2aaaaaccc160
> 
> 
> 
> 
> 
>> thanks,
>> --steve
> 
> 
> 
> 
>>  In the woods too, a man casts off his years, as the snake his slough,
> 
> 
>> and at what period soever of life, is always a child.
> 
> 
> 
> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> 
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev



More information about the lldb-dev mailing list