[lldb-dev] test rerun phase is in

Tue Dec 15 15:46:59 PST 2015

Hey Ying,

I'm going to check in something that stops the rerun logic when both (1) -A
aarch64 is specified and (2) --rerun-all-issues is not specified.

That'll give me some time to drill into what's getting stuck on the android
buildbot.

-Todd

On Tue, Dec 15, 2015 at 3:36 PM, Todd Fiala <todd.fiala at gmail.com> wrote:

> #4310 failed for some other reason.
>
> #4311 looks like it might be stuck in the test3 phase but it is showing
> less output than it had before (maybe because it hasn't timed out yet).
>
> I'm usually running with --rerun-all-issues, but I can force similar
> failures to what this bot is seeing when I crank up the load over there on
> an OS X box.  I'm doing that now and I'm omitting the --rerun-all-issues
> flag, which should be close to how the android testbot is running.
> Hopefully I can force it to fail here.
>
> If not, I'll temporarily disable the rerun unless --rerun-all-issues until
> we can figure out what's causing the stall.
>
> BTW - how many cores are present on that box?  That will help me figure
> out which runner is being used for the main phase.
>
> Thanks!
>
> -Todd
>
> On Tue, Dec 15, 2015 at 2:34 PM, Todd Fiala <todd.fiala at gmail.com> wrote:
>
>> Build >= #4310 is what I'll be watching.
>>
>>
>> On Tue, Dec 15, 2015 at 2:30 PM, Todd Fiala <todd.fiala at gmail.com> wrote:
>>
>>> Okay cool.  Will do.
>>>
>>> On Tue, Dec 15, 2015 at 2:22 PM, Ying Chen <chying at google.com> wrote:
>>>
>>>> Sure. Please go ahead to do that.
>>>> BTW, the pending builds should be merged into one build once current
>>>> build is done.
>>>>
>>>> On Tue, Dec 15, 2015 at 2:12 PM, Todd Fiala <todd.fiala at gmail.com>
>>>> wrote:
>>>>
>>>>> Hey Ying,
>>>>>
>>>>> Do you mind if we clear the android builder queue to get a build with
>>>>> r255676 in it?  There are what looks like at least 3 or 4 builds between
>>>>> now and then, and with timeouts it may take several hours.
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Tue, Dec 15, 2015 at 1:50 PM, Ying Chen <chying at google.com> wrote:
>>>>>
>>>>>> Yes, it happens every time for android builder.
>>>>>>
>>>>>> On Tue, Dec 15, 2015 at 1:45 PM, Todd Fiala <todd.fiala at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hmm, yeah it looks like it did the rerun and then after finishing
>>>>>>> the rerun, it's just hanging.
>>>>>>>
>>>>>>> Let's have a look right after r255676 goes through this builder.  I
>>>>>>> hit a hang in the curses output display due to the recursive taking of a
>>>>>>> lock on a lock that was not recursive-enabled.  While I would have expected
>>>>>>> to see that with the basic results output that this builder here is using
>>>>>>> when I was testing earlier, it's possible somehow that we're hitting a path
>>>>>>> here that is attempting to recursively take a lock.
>>>>>>>
>>>>>>> Do you know if it is happening every single time a rerun occurs?
>>>>>>>  (Hopefully yes?)
>>>>>>>
>>>>>>> -Todd
>>>>>>>
>>>>>>> On Tue, Dec 15, 2015 at 1:38 PM, Todd Fiala <todd.fiala at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yep, I'll have a look!
>>>>>>>>
>>>>>>>> On Tue, Dec 15, 2015 at 12:43 PM, Ying Chen <chying at google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Todd,
>>>>>>>>>
>>>>>>>>> It is noticed on lldb android builders that the test_runner didn't
>>>>>>>>> exit after rerun, which caused buildbot timeout since the process was
>>>>>>>>> hanging for over 20 minutes.
>>>>>>>>> Could you please take a look if that's related to your change?
>>>>>>>>>
>>>>>>>>> Please see the following builds.
>>>>>>>>>
>>>>>>>>> http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-android/builds/4305/steps/test3/logs/stdio
>>>>>>>>>
>>>>>>>>> http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-android/builds/4305/steps/test7/logs/stdio
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ying
>>>>>>>>>
>>>>>>>>> On Mon, Dec 14, 2015 at 4:52 PM, Todd Fiala via lldb-dev <
>>>>>>>>> lldb-dev at lists.llvm.org> wrote:
>>>>>>>>>
>>>>>>>>>> And, btw, this shows the rerun logic working (via the
>>>>>>>>>> --rerun-all-issues flag):
>>>>>>>>>>
>>>>>>>>>> time test/dotest.py --executable `pwd`/build/Debug/lldb --threads
>>>>>>>>>> 24 --rerun-all-issues
>>>>>>>>>> Testing: 416 test suites, 24 threads
>>>>>>>>>> 377 out of 416 test suites processed - TestSBTypeTypeClass.py
>>>>>>>>>>
>>>>>>>>>> Session logs for test failures/errors/unexpected successes will
>>>>>>>>>> go into directory '2015-12-14-16_44_28'
>>>>>>>>>> Command invoked: test/dotest.py --executable
>>>>>>>>>> /Users/tfiala/src/lldb-tot/lldb/build/Debug/lldb --threads 24
>>>>>>>>>> --rerun-all-issues -s 2015-12-14-16_44_28 --results-port 62322 --inferior
>>>>>>>>>> -p TestMultithreaded.py
>>>>>>>>>> /Users/tfiala/src/lldb-tot/lldb/packages/Python/lldbsuite/test
>>>>>>>>>> --event-add-entries worker_index=3:int
>>>>>>>>>>
>>>>>>>>>> Configuration: arch=x86_64 compiler=clang
>>>>>>>>>>
>>>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>>> Collected 8 tests
>>>>>>>>>>
>>>>>>>>>> lldb_codesign: no identity found
>>>>>>>>>> lldb_codesign: no identity found
>>>>>>>>>> lldb_codesign: no identity found
>>>>>>>>>> lldb_codesign: no identity found
>>>>>>>>>> lldb_codesign: no identity found
>>>>>>>>>> lldb_codesign: no identity found
>>>>>>>>>> lldb_codesign: no identity found
>>>>>>>>>>
>>>>>>>>>> [TestMultithreaded.py FAILED]
>>>>>>>>>> Command invoked: /usr/bin/python test/dotest.py --executable
>>>>>>>>>> /Users/tfiala/src/lldb-tot/lldb/build/Debug/lldb --threads 24
>>>>>>>>>> --rerun-all-issues -s 2015-12-14-16_44_28 --results-port 62322 --inferior
>>>>>>>>>> -p TestMultithreaded.py
>>>>>>>>>> /Users/tfiala/src/lldb-tot/lldb/packages/Python/lldbsuite/test
>>>>>>>>>> --event-add-entries worker_index=3:int
>>>>>>>>>> 396 out of 416 test suites processed - TestMiBreak.py
>>>>>>>>>>
>>>>>>>>>> Session logs for test failures/errors/unexpected successes will
>>>>>>>>>> go into directory '2015-12-14-16_44_28'
>>>>>>>>>> Command invoked: test/dotest.py --executable
>>>>>>>>>> /Users/tfiala/src/lldb-tot/lldb/build/Debug/lldb --threads 24
>>>>>>>>>> --rerun-all-issues -s 2015-12-14-16_44_28 --results-port 62322 --inferior
>>>>>>>>>> -p TestDataFormatterObjC.py
>>>>>>>>>> /Users/tfiala/src/lldb-tot/lldb/packages/Python/lldbsuite/test
>>>>>>>>>> --event-add-entries worker_index=12:int
>>>>>>>>>>
>>>>>>>>>> Configuration: arch=x86_64 compiler=clang
>>>>>>>>>>
>>>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>>> Collected 26 tests
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [TestDataFormatterObjC.py FAILED]
>>>>>>>>>> Command invoked: /usr/bin/python test/dotest.py --executable
>>>>>>>>>> /Users/tfiala/src/lldb-tot/lldb/build/Debug/lldb --threads 24
>>>>>>>>>> --rerun-all-issues -s 2015-12-14-16_44_28 --results-port 62322 --inferior
>>>>>>>>>> -p TestDataFormatterObjC.py
>>>>>>>>>> /Users/tfiala/src/lldb-tot/lldb/packages/Python/lldbsuite/test
>>>>>>>>>> --event-add-entries worker_index=12:int
>>>>>>>>>> 416 out of 416 test suites processed - TestLldbGdbServer.py
>>>>>>>>>> 2 test files marked for rerun
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Rerunning the following files:
>>>>>>>>>>
>>>>>>>>>> functionalities/data-formatter/data-formatter-objc/TestDataFormatterObjC.py
>>>>>>>>>>   api/multithreaded/TestMultithreaded.py
>>>>>>>>>> Testing: 2 test suites, 1 thread
>>>>>>>>>> 2 out of 2 test suites processed - TestMultithreaded.py
>>>>>>>>>> Test rerun complete
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> =============
>>>>>>>>>> Issue Details
>>>>>>>>>> =============
>>>>>>>>>> UNEXPECTED SUCCESS: test_symbol_name_dsym
>>>>>>>>>> (functionalities/completion/TestCompletion.py)
>>>>>>>>>> UNEXPECTED SUCCESS: test_symbol_name_dwarf
>>>>>>>>>> (functionalities/completion/TestCompletion.py)
>>>>>>>>>>
>>>>>>>>>> ===================
>>>>>>>>>> Test Result Summary
>>>>>>>>>> ===================
>>>>>>>>>> Test Methods:       1695
>>>>>>>>>> Reruns:               30
>>>>>>>>>> Success:            1367
>>>>>>>>>> Expected Failure:     90
>>>>>>>>>> Failure:               0
>>>>>>>>>> Error:                 0
>>>>>>>>>> Exceptional Exit:      0
>>>>>>>>>> Unexpected Success:    2
>>>>>>>>>> Skip:                236
>>>>>>>>>> Timeout:               0
>>>>>>>>>> Expected Timeout:      0
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 14, 2015 at 4:51 PM, Todd Fiala <todd.fiala at gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> And that fixed the rest as well.  Thanks, Siva!
>>>>>>>>>>>
>>>>>>>>>>> -Todd
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Dec 14, 2015 at 4:44 PM, Todd Fiala <
>>>>>>>>>>> todd.fiala at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Heh you were skinning the same cat :-)
>>>>>>>>>>>>
>>>>>>>>>>>> That fixed the one I was just looking at, running the others
>>>>>>>>>>>> now.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Dec 14, 2015 at 4:42 PM, Todd Fiala <
>>>>>>>>>>>> todd.fiala at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Yep, will try now...  (I was just looking at the condition
>>>>>>>>>>>>> testing logic since it looks like something isn't quite right there).
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Dec 14, 2015 at 4:39 PM, Siva Chandra <
>>>>>>>>>>>>> sivachandra at google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you try again after taking my change at r255584?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Dec 14, 2015 at 4:31 PM, Todd Fiala via lldb-dev
>>>>>>>>>>>>>> <lldb-dev at lists.llvm.org> wrote:
>>>>>>>>>>>>>> > I'm having some of these blow up.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > In the case of test/lang/c/typedef/Testtypedef.py, it looks
>>>>>>>>>>>>>> like some of the
>>>>>>>>>>>>>> > @expected decorators were changed a bit, and perhaps they
>>>>>>>>>>>>>> are not pound for
>>>>>>>>>>>>>> > pound the same.  For example, this test used to really be
>>>>>>>>>>>>>> marked XFAIL (via
>>>>>>>>>>>>>> > an expectedFailureClang directive), but it looks like the
>>>>>>>>>>>>>> current marking of
>>>>>>>>>>>>>> > compiler="clang" is either not right or not working, since
>>>>>>>>>>>>>> the test is run
>>>>>>>>>>>>>> > on OS X and is treated like it is expected to pass.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > I'm drilling into that a bit more, that's just the first of
>>>>>>>>>>>>>> several that
>>>>>>>>>>>>>> > fail with these changes on OS X.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On Mon, Dec 14, 2015 at 3:03 PM, Zachary Turner <
>>>>>>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> I've checked in r255567 which fixes a problem pointed out
>>>>>>>>>>>>>> by Siva.  It
>>>>>>>>>>>>>> >> doesn't sound related to in 255542, but looking at those
>>>>>>>>>>>>>> logs I can't really
>>>>>>>>>>>>>> >> tell how my CL would be related.  If r255567 doesn't fix
>>>>>>>>>>>>>> the bots, would
>>>>>>>>>>>>>> >> someone mind helping me briefly?  r255542 seems pretty
>>>>>>>>>>>>>> straightforward, so I
>>>>>>>>>>>>>> >> don't see why it would have an effect here.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> On Mon, Dec 14, 2015 at 2:35 PM Todd Fiala <
>>>>>>>>>>>>>> todd.fiala at gmail.com> wrote:
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> Ah yes I see.  Thanks, Ying (and Siva!  Saw your comments
>>>>>>>>>>>>>> too).
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> On Mon, Dec 14, 2015 at 2:34 PM, Ying Chen <
>>>>>>>>>>>>>> chying at google.com> wrote:
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> Seems this is the first build that fails, and it only
>>>>>>>>>>>>>> has one CL 255542.
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-cmake/builds/9446
>>>>>>>>>>>>>> >>>> I believe Zachary is looking at that problem.
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> On Mon, Dec 14, 2015 at 2:18 PM, Todd Fiala <
>>>>>>>>>>>>>> todd.fiala at gmail.com>
>>>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> I am seeing several failures on the Ubuntu 14.04
>>>>>>>>>>>>>> testbot, but
>>>>>>>>>>>>>> >>>>> unfortunately there are a number of changes that went
>>>>>>>>>>>>>> in at the same time on
>>>>>>>>>>>>>> >>>>> that build.  The failures I'm seeing are not appearing
>>>>>>>>>>>>>> at all related to the
>>>>>>>>>>>>>> >>>>> test running infrastructure.
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> Anybody with a fast Linux system able to take a look to
>>>>>>>>>>>>>> see what
>>>>>>>>>>>>>> >>>>> exactly is failing there?
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> -Todd
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> On Mon, Dec 14, 2015 at 1:39 PM, Todd Fiala <
>>>>>>>>>>>>>> todd.fiala at gmail.com>
>>>>>>>>>>>>>> >>>>> wrote:
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> Hi all,
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> I just put in the single-worker, low-load, follow-up
>>>>>>>>>>>>>> test run pass in
>>>>>>>>>>>>>> >>>>>> r255543.  Most of the work for it went in late last
>>>>>>>>>>>>>> week, this just mostly
>>>>>>>>>>>>>> >>>>>> flips it on.
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> The feature works like this:
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> * First test phase works as before: run all tests
>>>>>>>>>>>>>> using whatever level
>>>>>>>>>>>>>> >>>>>> of concurrency is normally used.  (e.g. 8 works on an
>>>>>>>>>>>>>> 8-logical-core box).
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> * Any timeouts, failures, errors, or anything else
>>>>>>>>>>>>>> that would have
>>>>>>>>>>>>>> >>>>>> caused a test failure is eligible for rerun if either
>>>>>>>>>>>>>> (1) it was marked as a
>>>>>>>>>>>>>> >>>>>> flakey test via the flakey decorator, or (2) if the
>>>>>>>>>>>>>> --rerun-all-issues
>>>>>>>>>>>>>> >>>>>> command line flag is provided.
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> * After the first test phase, if there are any tests
>>>>>>>>>>>>>> that met rerun
>>>>>>>>>>>>>> >>>>>> eligibility that would have caused a test failure,
>>>>>>>>>>>>>> those get run using a
>>>>>>>>>>>>>> >>>>>> serial test phase.  Their results will overwrite (i.e.
>>>>>>>>>>>>>> replace) the previous
>>>>>>>>>>>>>> >>>>>> result for the given test method.
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> The net result should be that tests that were load
>>>>>>>>>>>>>> sensitive and
>>>>>>>>>>>>>> >>>>>> intermittently fail during the first
>>>>>>>>>>>>>> higher-concurrency test phase should
>>>>>>>>>>>>>> >>>>>> (in theory) pass in the second, single worker test
>>>>>>>>>>>>>> phase when the test suite
>>>>>>>>>>>>>> >>>>>> is only using a single worker.  This should make the
>>>>>>>>>>>>>> test suite generate
>>>>>>>>>>>>>> >>>>>> fewer false positives on test failure notification,
>>>>>>>>>>>>>> which should make
>>>>>>>>>>>>>> >>>>>> continuous integration servers (testbots) much more
>>>>>>>>>>>>>> useful in terms of
>>>>>>>>>>>>>> >>>>>> generating actionable signals caused by version
>>>>>>>>>>>>>> control changes to the lldb
>>>>>>>>>>>>>> >>>>>> or related sources.
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> Please let me know if you see any issues with this
>>>>>>>>>>>>>> when running the
>>>>>>>>>>>>>> >>>>>> test suite using the default output.  I'd like to fix
>>>>>>>>>>>>>> this up ASAP.  And for
>>>>>>>>>>>>>> >>>>>> those interested in the implementation, I'm happy to
>>>>>>>>>>>>>> do post-commit
>>>>>>>>>>>>>> >>>>>> review/changes as needed to get it in good shape.
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> I'll be watching the  builders now and will address
>>>>>>>>>>>>>> any issues as I
>>>>>>>>>>>>>> >>>>>> see them.
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> Thanks!
>>>>>>>>>>>>>> >>>>>> --
>>>>>>>>>>>>>> >>>>>> -Todd
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>>>> >>>>> -Todd
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> --
>>>>>>>>>>>>>> >>> -Todd
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > --
>>>>>>>>>>>>>> > -Todd
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>>>>> > lldb-dev mailing list
>>>>>>>>>>>>>> > lldb-dev at lists.llvm.org
>>>>>>>>>>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> -Todd
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> -Todd
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> -Todd
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> -Todd
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> lldb-dev mailing list
>>>>>>>>>> lldb-dev at lists.llvm.org
>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -Todd
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -Todd
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -Todd
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> -Todd
>>>
>>
>>
>>
>> --
>> -Todd
>>
>
>
>
> --
> -Todd
>

-- 
-Todd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20151215/0a2c496a/attachment-0001.html>