[lldb-dev] Proposal for refactoring process launching

Tue Sep 23 13:38:14 PDT 2014

Apologies in advance for the long read, tried to write this up as clearly
as possible.  At a minimum I'd like to complete the HostProcess abstraction
and the LaunchProcess signature change, but I think the part that follows
(which I've listed as optional) is a net gain for the codebase and improves
the maintainability and clarity of all of this code.

This will discuss a plan for refactoring process management and launching
code in the Host layer into a set of reusable abstractions.  This is part
of a continued effort to refactor Host.cpp in a way that lends itself to
better re-usability, better platform-separation (code specific to platform
X is isolated to a file specific to platform X), and a stronger barrier
between generic code and platform specific code.  One of the biggest
difficulties in my efforts to port LLDB to Windows so far has been locating
and fixing places in LLDB where an operation was assumed to be generic
(either logically or through the use of a particular API) when it actually
was not.  By re-thinking the Host layer from the bottom up with Windows in
mind, we can arrive at a set of abstractions that is truly generic enough
to be used across all platforms, while still making it possible to use
platform specific features from other platform specific code.

The HostProcess abstraction

It's probably easiest to start with the core abstraction itself, and then
discuss the other supporting changes needed to arrive at that.  At the core
of this refactor is the |HostProcess| abstraction.  This is a replacement
for what is currently an lldb::process_t, and serves as an abstraction that
allows one to manipulate a process running on the system.  In order to
create a strong barrier between generic code and platform specific code,
|HostProcess| will use a facade pattern similar to that used by
|HostThread|, whereby |HostProcess| provides only methods which are generic
enough to work on every platform, and wraps a |HostNativeProcessBase| for
which there might be many derived implementations, each implementing the
generic options for the particular platform, as well as potentially
offering any relevant platform specific operations so that other platform
specific code can take full advantage of the platform.

A generic interface for process will provide at a minimum the following
methods:

   -

   Terminate
   -

   GetMainModule
   -

   GetProcessId
   -

   IsRunning

Additionally, there may be use for methods to inspect the process while
running, such as methods like the following:

   -

   ReadProcessMemory
   -

   WriteProcessMemory

Additional platform-specific methods will be exposed through the
corresponding subclasses of |HostNativeProcessBase|.  The most obvious
example of this is a Signal method that will be exposed through
|HostProcessPosix|, although the proposed design provides the flexibility
for this set of platform-specific methods to be arbitrary.

Launching Processes

Currently the ability to launch a process is exposed through a single
method Host::LaunchProcess which takes a |ProcessLaunchInfo| argument.
Upon completion, LaunchProcess writes the pid of the newly created process
into the |ProcessLaunchInfo|.  Because of this, the ProcessLaunchInfo
structure is required to be passed in as a non-const reference so it can be
modified.  To address this, the launch method will be modified to return a
HostProcess.  The PID will not be written back into the |ProcessLaunchInfo|
(and if possible the pid field will ideally be removed from
|ProcessLaunchInfo| entirely), allowing it to be passed const.

Optional code re-org related to process launching

The process launching code is relatively confusing and difficult to
maintain.  Because the code is not modular and not written to be re-usable
at a granular level, some hackery exists to make the correct methods be
found by the linker on various platforms.  The following sections describes
a possible code re-organization that makes everything more modular and
re-usable, and makes the code easier to understand and maintain as a result.

The basis of this code re-organization is the creation of a
|ProcessLauncher| interface will be created.  Instead of using
Host::LaunchProcess, anyone wishing to launch a process will instantiate
some subclass of |ProcessLauncher|, and then invoke the Launch method on
this interface, which will return a HostProcess.  For those just wishing to
get the default behavior on a particular platform (i.e. get the behavior
equivalent to calling Host::LaunchProcess now), |DefaultProcessLauncher|
will be typedef'ed to the most sensible process launcher on each platform.
This allows existing code to be written by simply instantiating a
|DefaultProcessLauncher| and using it.  The interface of |ProcessLauncher|
will provide the following methods:

   -

   LaunchProcess
   -

   AddExitCallback

The second of these two methods, AddExitCallback, replaces and improves
upon the existing mechanism in |ProcessLaunchInfo| which allows the user to
specify an exit callback.  The improvement comes from the fact that
multiple exit callbacks can be specified.

This approach yields a number of benefits.  For starters, it allows us to
abstract out the notion of the launching strategy into a reusable piece of
code so that platforms which have multiple launching strategies have the
ability to make use of specific strategies in other platform specific code,
or so that remote process launches can be treated the same as local process
launches.  As an example, on MacOSX there are three different launching
strategies:  Launch with XPC services, launch with Applescript, and launch
with posix_spawn.  Each of these could be independent implementations of
|ProcessLauncher|, with the default implementation on MacOSX being a fourth
implementation that chooses one of the first 3 implementations to delegate
to at runtime.  As a result, the 3 implementations can be used
independently of each other should the need arise.

A second benefit is that it opens up the possibility of treating remote
process launch transparently with local process launch -- just have a
|RemoteProcessLauncher| class.

There are additional benefits in the form of overall code health, platform
separation, and just generally improving the modularity of the codebase,
although those are perhaps less tangible.

Monitoring for process exit

It is important to know when a process exits, for 2 reasons.  First, on
some platforms the process needs to be reaped.  Second, asynchronous exit
callbacks need to be invoked.  The implementation of the monitoring is
platform dependent, and a strategy for each platform can be summarized as
follows:

   -

   Windows
   1.

      Create a background thread, use WaitForSingleObject
      2.

      [Better, alternative implementation] Create one "static" background
      thread for all processes, use WaitForMultipleObjects() to monitor many
      processes at once, like select().
      -

   MacOSX
   1.

      Processes spawned with posix_spawn - Use MacOSX "dispatch" library to
      queue monitoring code on a system thread.
      2.

      Processes spawned with XPC - Use MacOSX "dispatch" library to queue
      monitoring code on a system thread.
      3.

      Processes spawned with Applescript - No monitoring necessary
      -

   Non-MacOSX posix systems
   1.

      Processes spawned with posix_spawn - Create a background thread, use
      waitpid in the background thread.

It’s possible to see some of the benefit in code re-organization from the
above breakdown of implementations.  All posix-based systems can sometimes
use a posix_spawn based strategy for launching processes, but MacOSX might use
other strategies depending on the situation.  Even when MacOSX uses
posix_spawn though, it still uses a different exit monitoring
implementation.  This set of modular abstractions handles this nicely.
 |ProcessLauncherPosixSpawn| goes in host/posix, and takes a
|ProcessExitMonitor| to its constructor.  If MacOSX decides at runtime to
launch using posix_spawn, it passes a |DispatchExitMonitor|, which lives in
host/macosx to the constructor.  All code always lives at the most generic
level where it makes sense for everyone, and there’s no code pollution or
pre-processor hackery to get certain functions to be visible.

Since the |HostProcess| can go out of scope while the process is still
running on the system, it is important that the lifetime of the
|ProcessExitMonitor| be tied to the lifetime of the process running on the
system, and not to the lifetime of the |HostProcess| object inside of
LLDB.  This can be achieved by having |HostProcess| store an
std::weak_ptr<ProcessExitMonitor>, and having the background thread own the
|ProcessExitMonitor| through an std::shared_ptr<ProcessExitMonitor>.  This
way, even if LLDB decides it no longer needs the |HostProcess|, if the
process is still running in the background it will get reaped and callbacks
fired correctly.

Getting it done

In an effort to keep the CLs manageable, this work will be split into
multiple changes, in the order described below:

   1.

   Create the HostProcess abstraction, implement it for all platforms.  Do
   not update any code to use HostProcess yet.
   2.

   Modify the signature of Host::LaunchProcess to return a HostProcess, and
   remove the pid from ProcessLaunchInfo.  Update calling code accordingly.
   3.

   Create the various implementations of ProcessExitMonitor, update
   Host::LaunchProcess to use the appropriate ProcessExitMonitor instead of
   calling StartMonitoringChildProcess.
   4. Create the various implementations of ProcessLauncher, update calling
   code to use them instead of Host::LaunchProcess, delete Host::LaunchProcess.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140923/e25779f2/attachment.html>