[lldb-dev] Linux/Posix spawning, signal masking

Todd Fiala tfiala at google.com
Wed Jan 22 17:00:31 PST 2014


This just went in.  I had to make several adjustments for the OSX build as
I had code that had to straddle Host.mm and Host.cpp.  I updated the fix to
include the Haswell change (that method moved from Host.mm to Host.cpp).

I used the configure-based OSX build process, which I can get to build but
not run tests yet.  Let me know if you hit any OSX test breaks due to this.


On Wed, Jan 22, 2014 at 3:04 PM, Todd Fiala <tfiala at google.com> wrote:

> (And I'll adjust the comment).
>
>
> On Wed, Jan 22, 2014 at 3:04 PM, Todd Fiala <tfiala at google.com> wrote:
>
>> Thanks.  It tested fine on Ubuntu, moving over to OS X.
>>
>>
>> On Wed, Jan 22, 2014 at 2:50 PM, Greg Clayton <gclayton at apple.com> wrote:
>>
>>> Looks good except the "// No more thread specific current working
>>> directory" comment when about to call __pthread_chdir() should be changed
>>> to:
>>>
>>>         // Set the working directory on this thread only
>>>
>>>
>>> On Jan 22, 2014, at 1:45 PM, Todd Fiala <tfiala at google.com> wrote:
>>>
>>> > So I just about have this done (the original 3 steps, not switching to
>>> fork()/exec()).
>>> >
>>> > One thing I hit was an area I noticed yesterday: on Linux (x86_64,
>>> Ubuntu 12.04), I'm finding that I need to *not* take defaults for all the
>>> signal handlers in lldb-gdbserver; otherwise, the target process never
>>> starts operating.  Technically the target process is exiting immediately
>>> with an exit_state of 127 with signal = 0.
>>> >
>>> > For the moment, I have that #ifdef'd in the Apple/Linux/FreeBSD
>>> LaunchProcessPosixSpawn method so that only Linux doesn't reset the child
>>> signal handlers to defaults.  I'm thinking the FreeBSD/Apple BSD lineage
>>> might behave more similarly here as my first stake in the sand for a guess,
>>> but that would be changing behavior for FreeBSD.  (Ed - any thoughts here?
>>>  I could just as well flip the #if defined(__linux__) to #if
>>> !defined(__APPLE__).)  Ultimately the root cause of this is not clear to me
>>> yet.
>>> >
>>> > For the directory handling, I'm having that handled via __pthread*
>>> methods on Apple and continuing to do what it did before on everything
>>> else.  I'll look into the fork()/exec() option as a follow up after I work
>>> through some of the more obviously broken bits of lldb-gdbserver.
>>> >
>>> > See below for the current state of the patch.  I'm about to run tests
>>> on Ubuntu and try a build and tests on my MBP.
>>> >
>>> > -Todd
>>> >
>>> >
>>> > On Wed, Jan 22, 2014 at 10:24 AM, Greg Clayton <gclayton at apple.com>
>>> wrote:
>>> > You might need to add "#if defined(__APPLE__)" around the calls to
>>> __pthread_chdir and __pthread_fchdir. Those might be darwin specific.
>>> >
>>> > There is no way to set your current working directory when using
>>> posix_spawn, so to work around this, we change directory in a thread
>>> specific way by calling __pthread_chdir(working_dir) and reverting it using
>>> __pthread_fchdir(-1), so as to not hose over the process that has the lldb
>>> shared library loaded by changing the global working directory.
>>> >
>>> > So in order to eventually fix being able to obey the launch flag
>>> eLaunchFlagDebug and also to be able to set the working directory, non
>>> darwin builds might need to change over to using fork()/exec() since after
>>> the fork the child process can set its working directory and also call
>>> ptrace() to start being debuggable and then calling exec(). When ptrace()
>>> has been called to allow the process to be debugged, I believe it stops
>>> right at the entry point upon exec. Again, this would be for non-darwin
>>> builds since Apple has a posix_spawn flag of POSIX_SPAWN_START_SUSPENDED
>>> that can be set to enforce stopping at the entry point and also has the
>>> thread specific way to change directory.
>>> >
>>> > Greg
>>> >
>>> >
>>> > On Jan 22, 2014, at 10:00 AM, Todd Fiala <tfiala at google.com> wrote:
>>> >
>>> > > Correct.
>>> > >
>>> > >
>>> > > On Wed, Jan 22, 2014 at 9:55 AM, Greg Clayton <gclayton at apple.com>
>>> wrote:
>>> > > So the fix will be:
>>> > > 1 - Remove LaunchProcessPosixSpawn() from Host.mm
>>> > > 2 - Copy it over into Host.cpp
>>> > > 3 - Make sure LaunchProcessPosixSpawn is compiled for Apple builds
>>> as well in Host.cpp
>>> > >
>>> > > Right?
>>> > >
>>> > >
>>> > > On Jan 22, 2014, at 9:54 AM, Todd Fiala <tfiala at google.com> wrote:
>>> > >
>>> > > > > It is indeed the right fix.
>>> > > >
>>> > > > Ok - I'll take care of that part.  Thanks!
>>> > > >
>>> > > >
>>> > > > On Wed, Jan 22, 2014 at 9:52 AM, Greg Clayton <gclayton at apple.com>
>>> wrote:
>>> > > > It is indeed the right fix. Host.mm has a mac specific version
>>> which does this:
>>> > > >
>>> > > >     sigset_t no_signals;
>>> > > >     sigset_t all_signals;
>>> > > >     sigemptyset (&no_signals);
>>> > > >     sigfillset (&all_signals);
>>> > > >     ::posix_spawnattr_setsigmask(&attr, &no_signals);
>>> > > >     ::posix_spawnattr_setsigdefault(&attr, &all_signals);
>>> > > >
>>> > > > Compared to the incorrect version you noticed in Host.cpp:
>>> > > >
>>> > > >     sigset_t no_signals;
>>> > > >     sigset_t all_signals;
>>> > > >     sigemptyset (&no_signals);
>>> > > >     sigfillset (&all_signals);
>>> > > >     ::posix_spawnattr_setsigmask(&attr, &all_signals);
>>> > > >     ::posix_spawnattr_setsigdefault(&attr, &no_signals);
>>> > > >
>>> > > > (notice all_signals and no_signals are reversed in the last two
>>> function calls in Host.cpp.
>>> > > >
>>> > > > You might compare the "LaunchProcessPosixSpawn" in Host.mm and try
>>> and just copy it to Host.cpp and see if it works. I can't find anything
>>> darwin specific in the code after a quick glance and we have used the
>>> Host.mm LaunchProcessPosixSpawn() for a few years now, so it is definitely
>>> tested...
>>> > > >
>>> > > > So the fix would seem to be:
>>> > > > 1 - Remove LaunchProcessPosixSpawn() from Host.mm
>>> > > > 2 - Copy it over into Host.cpp
>>> > > > 3 - Make sure LaunchProcessPosixSpawn is compiled for Apple builds
>>> as well in Host.cpp
>>> > > >
>>> > > >
>>> > > > On Jan 21, 2014, at 10:49 PM, Todd Fiala <tfiala at google.com>
>>> wrote:
>>> > > >
>>> > > > > Hi all,
>>> > > > >
>>> > > > > In source/Host/common/Host.cpp, there is a posix process
>>> spawning method called LaunchProcessPosixSpawn.  On my Linux system, it is
>>> used to start the target process in lldb-gdbserver.  It looks like FreeBSD
>>> uses it as well.  Most of the Platform launchers appear to funnel to it as
>>> well (via Host::LaunchProcess ()).
>>> > > > >
>>> > > > > There is a section of code in LaunchProcessPosixSpawn that masks
>>> all signals for the child process that is started up:
>>> > > > >
>>> > > > > ::posix_spawnattr_setsigmask(&attr, &all_signals)
>>> > > > >
>>> > > > > When that is set, it seems to be preventing the child from
>>> receiving everything except the non-blockable signals.  This has the effect
>>> of (at the very least) blocking SIGINT (i.e. ^C from the keyboard) and
>>> SIGTERM (standard unadorned "kill pid").  This appears to be the cause of a
>>> few bugs as I try to get lldb-gdbserver working on Linux.  For example:
>>> > > > >
>>> > > > > 1. hitting ^C on an lldb-gdbserver that spawned a debuggee
>>> target process would kill the lldb-gdbserver, but not the debuggee target,
>>> which continues to run.
>>> > > > >
>>> > > > > 2. sending a SIGTERM (kill {debuggee pid}) while it is getting
>>> debugged and running is ignored.
>>> > > > >
>>> > > > > 3. sending a SIGTERM to the debuggee after issue #1 (where
>>> lldb-gdbserver is no longer running) is ignored.
>>> > > > >
>>> > > > > 4. killing lldb-gdbserver from an attached lldb that shuts down
>>> and kills the remote does indeed kill lldb-gebserver, but does not kill the
>>> target process.
>>> > > > >
>>> > > > > (Of course sending a 'kill -9 {debuggee pid}' works fine to kill
>>> the target process).
>>> > > > >
>>> > > > >
>>> > > > > I've modified this method locally to not mask any signals:
>>> > > > >
>>> > > > > ::posix_spawnattr_setsigmask (&attr, &no_signals)
>>> > > > >
>>> > > > > This seems to address all the problems I had above for
>>> lldb-gdbserver as signals propagate properly, and the target responds
>>> correctly to signals sent when the parent dies, etc.
>>> > > > >
>>> > > > > I've also run all the existing tests (on my system, that amounts
>>> to 275 tests that really run), and those are all passing.
>>> > > > >
>>> > > > > However, given that so much code flows through here (or at least
>>> appears to) that is not directly related to the lldb-gdbserver --- i.e.
>>> local linux/FreeBSD debugging --- I'm highly skeptical that this is the
>>> right fix.  I did try using local debugging with lldb with my change in
>>> place, and that seemed to work fine.  But I'm thinking that perhaps the
>>> signal blocking was intended and that behavior is needed in some cases that
>>> (perhaps) are not covered by tests that run on Linux.
>>> > > > >
>>> > > > > Any thoughts on why all the signals were getting masked on
>>> process spawning?  Does that change look okay as is?
>>> > > > >
>>> > > > > Thanks!
>>> > > > >
>>> > > > > Sincerely,
>>> > > > > Todd Fiala
>>> > > > > _______________________________________________
>>> > > > > lldb-dev mailing list
>>> > > > > lldb-dev at cs.uiuc.edu
>>> > > > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > Todd Fiala |   Software Engineer |     tfiala at google.com |
>>> 650-943-3180
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Todd Fiala |   Software Engineer |     tfiala at google.com |
>>> 650-943-3180
>>> > >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Todd Fiala |   Software Engineer |     tfiala at google.com |
>>> 650-943-3180
>>> >
>>> > <posix_spawn_2014-01-22_01.diff>
>>>
>>>
>>
>>
>> --
>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>
>
>
>
> --
> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>



-- 
Todd Fiala | Software Engineer | tfiala at google.com | 650-943-3180
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140122/61f81db9/attachment.html>


More information about the lldb-dev mailing list