[lldb-dev] TestRaise.py test_restart_bug flakey stats

Pavel Labath via lldb-dev lldb-dev at lists.llvm.org
Mon Oct 19 06:49:57 PDT 2015

I have created this test to reproduce a race condition in
ProcessGDBRemote. Given that it tests a race condition, it cannot be
failing 100% of the time, but I agree with Tamas that we should keep
it as XFAIL to avoid noise in the buildbots.


On 19 October 2015 at 12:30, Tamas Berghammer via lldb-dev
<lldb-dev at lists.llvm.org> wrote:
> The expected flakey works a bit differently then you are described:
> * Run the tests
> * If it passes, it goes as a successful test and we are done
> * Run the test again
> * If it is passes the 2nd time then record it as expected failure (IMO
> expected falkey would be a better result, but we don't have that category)
> * If it fails 2 times in a row then record it as a failure because a flakey
> test should pass at least once in every 2 run (it means we need ~95% success
> rate to keep the build bot green in most of the time). If it isn't passing
> often enough for that then it should be marked as expected failure. This is
> done this way to detect the case when a flakey test get broken completely by
> a new change.
> I checked some states for TestRaise on the build bot and in the current
> definition of expected flakey we shouldn't mark it as flakey because it will
> often fail 2 times in a row (it passing rate is ~50%) what will be reported
> as a failure making the build bot red.
> I will send you the full stats from the lass 100 build in a separate off
> list mail as it is a too big for the mailing list. If somebody else is
> interested in it then let me know.
> Tamas
> On Sun, Oct 18, 2015 at 2:18 AM Todd Fiala <todd.fiala at gmail.com> wrote:
>> Nope, no good either when I limit the flakey to DWO.
>> So perhaps I don't understand how the flakey marking works.  I thought it
>> meant:
>> * run the test.
>> * If it passes, it goes as a successful test.  Then we're done.
>> * run the test again.
>> * If it passes, then we're done and mark it a successful test.  If it
>> fails, then mark it an expected failure.
>> But that's definitely not the behavior I'm seeing, as a flakey marking in
>> the above scheme should never produce a failing test.
>> I'll have to revisit the flakey test marking to see what it's really doing
>> since my understanding is clearly flawed!
>> On Sat, Oct 17, 2015 at 5:57 PM, Todd Fiala <todd.fiala at gmail.com> wrote:
>>> Hmm, the flakey behavior may be specific to dwo.  Testing it locally as
>>> unconditionally flaky on Linux is failing on dwarf.  All the ones I see
>>> succeed are dwo.  I wouldn't expect a diff there but that seems to be the
>>> case.
>>> So, the request still stands but I won't be surprised if we find that dwo
>>> sometimes passes while dwarf doesn't (or at least not enough to get through
>>> the flakey setting).
>>> On Sat, Oct 17, 2015 at 4:57 PM, Todd Fiala <todd.fiala at gmail.com> wrote:
>>>> Hi Tamas,
>>>> I think you grabbed me stats on failing tests in the past.  Can you dig
>>>> up the failure rate for TestRaise.py's test_restart_bug() variants on Ubuntu
>>>> 14.04 x86_64?  I'd like to mark it as flaky on Linux, since it is passing
>>>> most of the time over here.  But I want to see if that's valid across all
>>>> Ubuntu 14.04 x86_64.  (If it is passing some of the time, I'd prefer marking
>>>> it flakey so that we don't see unexpected successes).
>>>> Thanks!
>>>> --
>>>> -Todd
>>> --
>>> -Todd
>> --
>> -Todd
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

More information about the lldb-dev mailing list