[lldb-dev] TestRaise.py test_restart_bug flakey stats

Todd Fiala via lldb-dev lldb-dev at lists.llvm.org
Mon Oct 19 08:51:33 PDT 2015


Okay.  I think for the time being, the XFAIL makes sense.  Per my previous
email, though, I think we should move away from unexpected success (XPASS)
being a "sometimes meaningful, sometimes meaningless" signal.  For almost
all cases, an unexpected success is an actionable signal.  I don't want it
to become the warning that everybody lives without fixing, and then it
hides a real issue when one surfaces.

Thanks for explaining what I was seeing!

-Todd

On Mon, Oct 19, 2015 at 6:49 AM, Pavel Labath <labath at google.com> wrote:

> I have created this test to reproduce a race condition in
> ProcessGDBRemote. Given that it tests a race condition, it cannot be
> failing 100% of the time, but I agree with Tamas that we should keep
> it as XFAIL to avoid noise in the buildbots.
>
> pl
>
> On 19 October 2015 at 12:30, Tamas Berghammer via lldb-dev
> <lldb-dev at lists.llvm.org> wrote:
> > The expected flakey works a bit differently then you are described:
> > * Run the tests
> > * If it passes, it goes as a successful test and we are done
> > * Run the test again
> > * If it is passes the 2nd time then record it as expected failure (IMO
> > expected falkey would be a better result, but we don't have that
> category)
> > * If it fails 2 times in a row then record it as a failure because a
> flakey
> > test should pass at least once in every 2 run (it means we need ~95%
> success
> > rate to keep the build bot green in most of the time). If it isn't
> passing
> > often enough for that then it should be marked as expected failure. This
> is
> > done this way to detect the case when a flakey test get broken
> completely by
> > a new change.
> >
> > I checked some states for TestRaise on the build bot and in the current
> > definition of expected flakey we shouldn't mark it as flakey because it
> will
> > often fail 2 times in a row (it passing rate is ~50%) what will be
> reported
> > as a failure making the build bot red.
> >
> > I will send you the full stats from the lass 100 build in a separate off
> > list mail as it is a too big for the mailing list. If somebody else is
> > interested in it then let me know.
> >
> > Tamas
> >
> > On Sun, Oct 18, 2015 at 2:18 AM Todd Fiala <todd.fiala at gmail.com> wrote:
> >>
> >> Nope, no good either when I limit the flakey to DWO.
> >>
> >> So perhaps I don't understand how the flakey marking works.  I thought
> it
> >> meant:
> >> * run the test.
> >> * If it passes, it goes as a successful test.  Then we're done.
> >> * run the test again.
> >> * If it passes, then we're done and mark it a successful test.  If it
> >> fails, then mark it an expected failure.
> >>
> >> But that's definitely not the behavior I'm seeing, as a flakey marking
> in
> >> the above scheme should never produce a failing test.
> >>
> >> I'll have to revisit the flakey test marking to see what it's really
> doing
> >> since my understanding is clearly flawed!
> >>
> >> On Sat, Oct 17, 2015 at 5:57 PM, Todd Fiala <todd.fiala at gmail.com>
> wrote:
> >>>
> >>> Hmm, the flakey behavior may be specific to dwo.  Testing it locally as
> >>> unconditionally flaky on Linux is failing on dwarf.  All the ones I see
> >>> succeed are dwo.  I wouldn't expect a diff there but that seems to be
> the
> >>> case.
> >>>
> >>> So, the request still stands but I won't be surprised if we find that
> dwo
> >>> sometimes passes while dwarf doesn't (or at least not enough to get
> through
> >>> the flakey setting).
> >>>
> >>> On Sat, Oct 17, 2015 at 4:57 PM, Todd Fiala <todd.fiala at gmail.com>
> wrote:
> >>>>
> >>>> Hi Tamas,
> >>>>
> >>>> I think you grabbed me stats on failing tests in the past.  Can you
> dig
> >>>> up the failure rate for TestRaise.py's test_restart_bug() variants on
> Ubuntu
> >>>> 14.04 x86_64?  I'd like to mark it as flaky on Linux, since it is
> passing
> >>>> most of the time over here.  But I want to see if that's valid across
> all
> >>>> Ubuntu 14.04 x86_64.  (If it is passing some of the time, I'd prefer
> marking
> >>>> it flakey so that we don't see unexpected successes).
> >>>>
> >>>> Thanks!
> >>>>
> >>>> --
> >>>> -Todd
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> -Todd
> >>
> >>
> >>
> >>
> >> --
> >> -Todd
> >
> >
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> >
>



-- 
-Todd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20151019/e718de47/attachment-0001.html>


More information about the lldb-dev mailing list