[Lldb-commits] [PATCH] D12651: Add ctrl-c support to parallel dotest.py.
Todd Fiala via lldb-commits
lldb-commits at lists.llvm.org
Fri Sep 4 21:56:16 PDT 2015
On Fri, Sep 4, 2015 at 5:40 PM, Zachary Turner <zturner at google.com> wrote:
>
>
> On Fri, Sep 4, 2015 at 5:10 PM Todd Fiala <todd.fiala at gmail.com> wrote:
>
>> tfiala added a comment.
>>
>> In http://reviews.llvm.org/D12651#240480, @zturner wrote:
>>
>> > Tried out this patch, unfortunately I'm seeing the same thing. The very
>> > first call to worker.join() is never returning.
>> >
>> > It's unfortunate that it's so hard to debug this stuff, do you have any
>> > suggestions for how I can try to nail down what the child dotest
>> instance
>> > is actually doing? I wonder if it's blocking somewhere in its script,
>> or
>> > if this is some quirk of the multiprocessing library's dynamic
>> invocation /
>> > whatever magic is does.
>> >
>> > How much of an effort would it be to make the switch to threads now?
>> The
>> > main thing we'd have to do is get rid of all of the globals in dotest,
>> and
>> > make a DoTest class or something.
>>
>>
>> It's a bit more work than I want to take on right now. I think we really
>> may want to keep the multiprocessing and just not exec out to dotest.py for
>> a third-ish time for each inferior.
>>
>
> Just to clarify, are you saying we may want to keep multiprocessing over
> threads even if you can solve the exec problem? Any particular reason?
>
Yes, you understood me correctly.
Prior to me getting into it, dosep.py was designed to isolate each test
into its own process (via the subprocess exec call) so that each test
directory or file got its own lldb processor and there was process-level
isolation, less contention on the Python global interpreter lock, etc.
Then, when Steve Pucci and later I got to making it multithreaded, we
wrapped the exec call in a "import threading" style thread pool. That
maintained the process isolation property by having each thread just do an
exec (i.e. multiple execs in parallel). Except, this didn't work on
MacOSX. The exec calls grab the Python GIL on OS X (and not anywhere as as
far as I could find). But multithreading + exec is a valid option for
everything not OS X.
The way I solved it to work for everyone was to drop the "import threading"
approach and switch to the "import multiprocessing" approach. This worked
everywhere, including on OS X (although with a few hiccups initially, as it
exposed occasional hangs at the time with what looked like socket handling
under Darwin). What I failed to see in my haste was that I then had two
levels of fork/exec-like behavior (i.e. we had two process firewalls where
we only needed one, at the cost of an extra exec): the multiprocessing
works by effectively forking/creating a new process that is now isolated.
But then we turn around and still create a subprocess to exec out to
dotest.py.
What I'm suggesting in the near future is if we stick with the
multiprocessing approach, and eliminate the subprocess exec and instead
just have the multiprocess worker call directly into a methodized entry
point in dotest.py, we can skip the subprocess call within the multiprocess
worker. It is already isolated and a separate process, so it is already
fulfilling the isolation requirement. And it reduces the doubled processes
created. And it works on OS X in addition to everywhere else. It does
become more difficult to debug, but then again the majority of the logic is
in dotest.py and can be debugged --no-multiprocess (or with logging).
This is all separate somewhat from the Ctrl-C issue, but it is the
backstory on what I'm referring to with the parallel test runner.
Completely as an aside, I did ask Greg Clayton to see if he can poke into
why OS X is hitting the Python GIL on execs in "import threading"-style
execs from multiple threads. But assuming nothing magic changes there and
it wasn't easily solved (I tried and failed after several attempts to
diagnose last year), I'd prefer to keep a strategy that is the same unless
there's a decent win on the execution front.
That all said, I'm starting to think a pluggable strategy for the actual
mechanic of the parallel test run might end up being best anyway since I'd
really like the Ctrl-C working and I'm not able to diagnose what's
happening on the Windows scenario.
> Multi-threaded is much easier to debug, for starters, because you can
> just attach your debugger to a single process. It also solves a lot of
> race conditions and makes output processing easier (not to mention higher
> performance), because you don't even need a way to have the sub-processes
> communicate their results back to the parent because the results are just
> in memory. stick them in a synchronized queue and the parent can just
> process it. So it would probably even speed up the test runner.
>
> I think if there's not a very good reason to keep multiprocessing around,
> we should aim for a threaded approach. My understanding is that lit
> already does this, so there's no fundamental reason it shouldn't work
> correctly on MacOSX, just have to solve the exec problem like you mentioned.
>
>
>
--
-Todd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20150904/34f0b0c1/attachment.html>
More information about the lldb-commits
mailing list