[Lldb-commits] [PATCH] D12651: Add ctrl-c support to parallel dotest.py.

Fri Sep 4 22:04:06 PDT 2015

Yep, I'm thinking that's right.

On Fri, Sep 4, 2015 at 10:02 PM, Zachary Turner <zturner at google.com> wrote:

> The pluggable method would at least allow everyone to continue working
> until someone has time to dig into what's wrong with multiprocess on Windows
>
> On Fri, Sep 4, 2015 at 9:56 PM Todd Fiala <todd.fiala at gmail.com> wrote:
>
>> On Fri, Sep 4, 2015 at 5:40 PM, Zachary Turner <zturner at google.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Sep 4, 2015 at 5:10 PM Todd Fiala <todd.fiala at gmail.com> wrote:
>>>
>>>> tfiala added a comment.
>>>>
>>>> In http://reviews.llvm.org/D12651#240480, @zturner wrote:
>>>>
>>>> > Tried out this patch, unfortunately I'm seeing the same thing.  The
>>>> very
>>>> >  first call to worker.join() is never returning.
>>>> >
>>>> > It's unfortunate that it's so hard to debug this stuff, do you have
>>>> any
>>>> >  suggestions for how I can try to nail down what the child dotest
>>>> instance
>>>> >  is actually doing?  I wonder if it's blocking somewhere in its
>>>> script, or
>>>> >  if this is some quirk of the multiprocessing library's dynamic
>>>> invocation /
>>>> >  whatever magic is does.
>>>> >
>>>> > How much of an effort would it be to make the switch to threads now?
>>>> The
>>>> >  main thing we'd have to do is get rid of all of the globals in
>>>> dotest, and
>>>> >  make a DoTest class or something.
>>>>
>>>>
>>>> It's a bit more work than I want to take on right now.  I think we
>>>> really may want to keep the multiprocessing and just not exec out to
>>>> dotest.py for a third-ish time for each inferior.
>>>>
>>>
>>> Just to clarify, are you saying we may want to keep multiprocessing over
>>> threads even if you can solve the exec problem?  Any particular reason?
>>>
>>
>> Yes, you understood me correctly.
>>
>> Prior to me getting into it, dosep.py was designed to isolate each test
>> into its own process (via the subprocess exec call) so that each test
>> directory or file got its own lldb processor and there was process-level
>> isolation, less contention on the Python global interpreter lock, etc.
>>
>> Then, when Steve Pucci and later I got to making it multithreaded, we
>> wrapped the exec call in a "import threading" style thread pool.  That
>> maintained the process isolation property by having each thread just do an
>> exec (i.e. multiple execs in parallel).  Except, this didn't work on
>> MacOSX.  The exec calls grab the Python GIL on OS X (and not anywhere as as
>> far as I could find).  But multithreading + exec is a valid option for
>> everything not OS X.
>>
>> The way I solved it to work for everyone was to drop the "import
>> threading" approach and switch to the "import multiprocessing" approach.
>> This worked everywhere, including on OS X (although with a few hiccups
>> initially, as it exposed occasional hangs at the time with what looked like
>> socket handling under Darwin).  What I failed to see in my haste was that I
>> then had two levels of fork/exec-like behavior (i.e. we had two process
>> firewalls where we only needed one, at the cost of an extra exec): the
>> multiprocessing works by effectively forking/creating a new process that is
>> now isolated.  But then we turn around and still create a subprocess to
>> exec out to dotest.py.
>>
>> What I'm suggesting in the near future is if we stick with the
>> multiprocessing approach, and eliminate the subprocess exec and instead
>> just have the multiprocess worker call directly into a methodized entry
>> point in dotest.py, we can skip the subprocess call within the multiprocess
>> worker.  It is already isolated and a separate process, so it is already
>> fulfilling the isolation requirement.  And it reduces the doubled processes
>> created.  And it works on OS X in addition to everywhere else.  It does
>> become more difficult to debug, but then again the majority of the logic is
>> in dotest.py and can be debugged --no-multiprocess (or with logging).
>>
>> This is all separate somewhat from the Ctrl-C issue, but it is the
>> backstory on what I'm referring to with the parallel test runner.
>>
>> Completely as an aside, I did ask Greg Clayton to see if he can poke into
>> why OS X is hitting the Python GIL on execs in "import threading"-style
>> execs from multiple threads.  But assuming nothing magic changes there and
>> it wasn't easily solved (I tried and failed after several attempts to
>> diagnose last year), I'd prefer to keep a strategy that is the same unless
>> there's a decent win on the execution front.
>>
>> That all said, I'm starting to think a pluggable strategy for the actual
>> mechanic of the parallel test run might end up being best anyway since I'd
>> really like the Ctrl-C working and I'm not able to diagnose what's
>> happening on the Windows scenario.
>>
>>
>>>   Multi-threaded is much easier to debug, for starters, because you can
>>> just attach your debugger to a single process.  It also solves a lot of
>>> race conditions and makes output processing easier (not to mention higher
>>> performance), because you don't even need a way to have the sub-processes
>>> communicate their results back to the parent because the results are just
>>> in memory.  stick them in a synchronized queue and the parent can just
>>> process it.  So it would probably even speed up the test runner.
>>>
>>> I think if there's not a very good reason to keep multiprocessing
>>> around, we should aim for a threaded approach.  My understanding is that
>>> lit already does this, so there's no fundamental reason it shouldn't work
>>> correctly on MacOSX, just have to solve the exec problem like you mentioned.
>>>
>>>
>>>
>>
>>
>> --
>> -Todd
>>
>

-- 
-Todd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20150904/68b361bd/attachment.html>