[lldb-dev] BasicResultsFormatter - new test results summary

Todd Fiala via lldb-dev lldb-dev at lists.llvm.org
Mon Dec 14 08:19:24 PST 2015


On Mon, Dec 14, 2015 at 6:58 AM, Pavel Labath <labath at google.com> wrote:

> Hi,
>
> thanks a lot for fixing the timeout issue on such a short notice.


Sure thing.  I didn't mind at all once I understood the context in which
they were being used.


> I
> didn't think I'd find myself defending them, as I remember being quite
> upset when they went in, but they have proven useful in stabilising
> the build bots,


Yes, I understand.  I am definitely not a fan of tests that sometimes give
useful signals (tests that don't lock up), but sometimes lock up (and
therefore don't give useful signals). But since the lock-up is catchable
via the timeout, and since we can figure that out definitively, this isn't
quite so bad as it seems at first blush.


> and I think it's likely you may need them as well.
>

I am hopeful the test rerun mechanism generally handles it, but that'll
only be the case if the timeout is triggered by heavy load.  Otherwise, the
rerun phase will be no different.  Time will tell :-)


> I'll try to now add a nicer way to expect timeouts so that we don't
> have the hack in the new runner as well. I'll add a new message, like
> you did for the flakey decorator.


Feel free to mimic what I did with the flakey decorator sending a new test
event that can additively apply the "'expected timeout" marking to the test
method run.  In fact, if you get the decorator set up and put in a

# TODO @tfiala add test event code here

I can get the rest of it in.


>
> I'm a bit uneasy about adding another kind of a decorator though. What
> would you (and anyone else reading this) think about adding this
> behavior to the existing XFAIL decorators?
> This way, "timeout" would become just another way in which a test can
> "fail", and any test marked with an XFAIL decorator would be eligible
> for this treatment.
>

I think I'd be okay with that.  I like reducing the number of ways we
support failure decoration, so this seems reasonable.


>
> We would lose the ability to individually expect "failures" and
> "timeouts", but I don't think that is really necessary, and I think it
> will be worth the extra maintainability we get from the fact of having
> fewer test decorators.
>
>
OTOH, the piece we then lose is the ability to have an XFAIL mean "Hey this
test really should fail, we haven't implemented feature XYZ (correctly or
otherwise), so this better fail."  In that semantic meaning, an unexpected
success would truly be an actionable signal --- either the test is now
passing because the feature now works (actionable signal option A: the
XFAIL should come off after verifying), or or the test is passing because
it is not testing what it thought it was, and the test needs to be modified
to more tightly bound the expected fail condition (actionable item option
B).

So it eliminates the definiteness of an XFAIL ideally meaning "this really
should fail," turning it into "it is permissible for this to fail."

All that said, our Python test suite is so far away from that ideal right
now.  The highest level output of our test suite that I care about is "if
tests run green, this is a good build", and if "tests run red, this is a
bad build."  I don't see the timeout being rolled into XFAIL as hurting
that.  It seems reasonable to roll them together at this time.  And the
test output will list and count the timeouts.

So I'd be okay with that at this time in the sake of simplifying markup for
tests.

What do you think?
>
> pl
>
>
> On 11 December 2015 at 17:54, Todd Fiala <todd.fiala at gmail.com> wrote:
> > Merging threads.
> >
> >> The concept is not there to protect against timeouts, which are caused
> > by processes being too slow, for these we have been increasing
> > timeouts where necessary.
> >
> > Okay, I see.  If that's the intent, then expected timeout sounds
> reasonable.
> > (My abhorrence was against the idea of using that as a replacement for
> > increasing a timeout that was too short under load).
> >
> > I would go with your original approach (the marking as expected timeout).
> > We can either have that generate a new event (much like a change I'm
> about
> > to put in that has flakey tests send and event indicating that they are
> > eligible for rerun) or annotate the start message.  FWIW, the startTest()
> > call on the LLDBTestResults gets called before decorators have a chance
> to
> > execute, which is why I'm going with the 'send an enabling event'
> approach.
> > (I'll be checking that in shortly here, like when I'm done writing this
> > email, so you'll see what I did there).
> >
> > On Fri, Dec 11, 2015 at 9:41 AM, Todd Fiala <todd.fiala at gmail.com>
> wrote:
> >>
> >>
> >>
> >> On Fri, Dec 11, 2015 at 3:26 AM, Pavel Labath <labath at google.com>
> wrote:
> >>>
> >>> Todd, I've had to disable the new result formatter as it was not
> >>> working with the expected timeout logic we have for the old one. The
> >>> old XTIMEOUT code is a massive hack and I will be extremely glad when
> >>> we get rid of it, but we can't keep our buildbot red until then, so
> >>> I've switched it off.
> >>>
> >>
> >> Ah, sorry my comments on the check-in precede me reading this.  Glad you
> >> see this as a hack :-)
> >>
> >> No worries on shutting it off.  I can get the expected timeout as
> >> currently written working with the updated summary results.
> >>
> >>>
> >>> I am ready to start working on this, but I wanted to run this idea
> >>> here first. I thought we could have a test annotation like:
> >>> @expectedTimeout(oslist=["linux"], ...)
> >>>
> >>> Then, when the child runner would encounter this annotation, it would
> >>> set a flag in the "test is starting" message indicating that this test
> >>> may time out. Then if the test really times out, the parent would know
> >>> about this, and it could avoid flagging the test as error.
> >>>
> >>
> >> Yes, the idea seems reasonable.  The actual implementation will end up
> >> being slightly different as the ResultsFormatter will receive the test
> start
> >> event (where the timeout is expected comes from), whereas the reporter
> of
> >> the timeout (the test worker) will not know anything about that data.
> It
> >> will still generate the timeout, but then the ResultsFormatter can deal
> with
> >> transforming this into the right event when a timeout is "okay".
> >>
> >>>
> >>> Alternatively, if we want to avoid the proliferation test result
> >>> states, we could key this off the standard @expectedFailure
> >>> annotation, then a "time out" would become just another way it which a
> >>> test can fail, and XTIMEOUT would become XFAIL.
> >>>
> >>> What do you think ?
> >>>
> >>
> >> Even though the above would work, if the issue here ultimately is that a
> >> larger timeout is needed, we can avoid all this by increasing the
> timeout.
> >> Probably more effective, though, is going to be running it in the
> follow-up,
> >> low-load, single worker pass, where presumably we would not hit the
> timeout.
> >> If you think that would work, I'd say:
> >>
> >> (1) short term (like in the next hour or so), I get the expected timeout
> >> working in the summary results.
> >>
> >> (2) longer term (like by end of weekend or maybe Monday at worst), we
> have
> >> the second pass test run at lower load (i.e. single worker thread),
> which
> >> should prevent these things from timing out in the first place.
> >>
> >> If the analysis of the cause of the timeout is incorrect, then really
> >> we'll want to do your initial proposal in the earlier paragraphs,
> though.
> >>
> >> What do you think about any of that?
> >>
> >>
> >>
> >>>
> >>> pl
> >>>
> >>> PS: I am pretty new to this part of code, so any pointers you have
> >>> towards implementing this would be extremely helpful.
> >>>
> >>>
> >>>
> >>> On 10 December 2015 at 23:20, Todd Fiala <todd.fiala at gmail.com> wrote:
> >>> > Checked this in as r255310.  Let me know if you find any issues with
> >>> > that,
> >>> > Tamas.
> >>> >
> >>> > You will need '-v' to enable it.  Otherwise, it will just print the
> >>> > method
> >>> > name.
> >>> >
> >>> > -Todd
> >>> >
> >>> > On Thu, Dec 10, 2015 at 2:39 PM, Todd Fiala <todd.fiala at gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> Sure, I can do that.
> >>> >>
> >>> >> Tamas, okay to give more detail on -v?  I'll give it a shot to see
> >>> >> what
> >>> >> else comes out if we do that.
> >>> >>
> >>> >> -Todd
> >>> >>
> >>> >> On Thu, Dec 10, 2015 at 12:58 PM, Zachary Turner <
> zturner at google.com>
> >>> >> wrote:
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> On Thu, Dec 10, 2015 at 12:54 PM Todd Fiala <todd.fiala at gmail.com>
> >>> >>> wrote:
> >>> >>>>
> >>> >>>> Hi Tamas,
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> On Thu, Dec 10, 2015 at 2:52 AM, Tamas Berghammer
> >>> >>>> <tberghammer at google.com> wrote:
> >>> >>>>>
> >>> >>>>> HI Todd,
> >>> >>>>>
> >>> >>>>> You changed the way the test failure list is printed in a way
> that
> >>> >>>>> now
> >>> >>>>> we only print the name of the test function failing with the name
> >>> >>>>> of the
> >>> >>>>> test file in parenthesis. Can we add back the name of the test
> >>> >>>>> class to this
> >>> >>>>> list?
> >>> >>>>
> >>> >>>>
> >>> >>>> Sure.  I originally planned to have that in there but there was
> some
> >>> >>>> discussion about it being too much info.  I'm happy to add that
> >>> >>>> back.
> >>> >>>
> >>> >>> Can we have it tied to verbosity level?  We have -t and -v, maybe
> one
> >>> >>> of
> >>> >>> those could trigger more detail in the summary view.
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> -Todd
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > -Todd
> >>
> >>
> >>
> >>
> >> --
> >> -Todd
> >
> >
> >
> >
> > --
> > -Todd
>



-- 
-Todd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20151214/6938830d/attachment-0001.html>


More information about the lldb-dev mailing list