[lldb-dev] TestRaise.py test_restart_bug flakey stats

Mon Oct 19 06:49:57 PDT 2015

I have created this test to reproduce a race condition in
ProcessGDBRemote. Given that it tests a race condition, it cannot be
failing 100% of the time, but I agree with Tamas that we should keep
it as XFAIL to avoid noise in the buildbots.

pl

On 19 October 2015 at 12:30, Tamas Berghammer via lldb-dev
<lldb-dev at lists.llvm.org> wrote:
> The expected flakey works a bit differently then you are described:
> * Run the tests
> * If it passes, it goes as a successful test and we are done
> * Run the test again
> * If it is passes the 2nd time then record it as expected failure (IMO
> expected falkey would be a better result, but we don't have that category)
> * If it fails 2 times in a row then record it as a failure because a flakey
> test should pass at least once in every 2 run (it means we need ~95% success
> rate to keep the build bot green in most of the time). If it isn't passing
> often enough for that then it should be marked as expected failure. This is
> done this way to detect the case when a flakey test get broken completely by
> a new change.
>
> I checked some states for TestRaise on the build bot and in the current
> definition of expected flakey we shouldn't mark it as flakey because it will
> often fail 2 times in a row (it passing rate is ~50%) what will be reported
> as a failure making the build bot red.
>
> I will send you the full stats from the lass 100 build in a separate off
> list mail as it is a too big for the mailing list. If somebody else is
> interested in it then let me know.
>
> Tamas
>
> On Sun, Oct 18, 2015 at 2:18 AM Todd Fiala <todd.fiala at gmail.com> wrote:
>>
>> Nope, no good either when I limit the flakey to DWO.
>>
>> So perhaps I don't understand how the flakey marking works.  I thought it
>> meant:
>> * run the test.
>> * If it passes, it goes as a successful test.  Then we're done.
>> * run the test again.
>> * If it passes, then we're done and mark it a successful test.  If it
>> fails, then mark it an expected failure.
>>
>> But that's definitely not the behavior I'm seeing, as a flakey marking in
>> the above scheme should never produce a failing test.
>>
>> I'll have to revisit the flakey test marking to see what it's really doing
>> since my understanding is clearly flawed!
>>
>> On Sat, Oct 17, 2015 at 5:57 PM, Todd Fiala <todd.fiala at gmail.com> wrote:
>>>
>>> Hmm, the flakey behavior may be specific to dwo.  Testing it locally as
>>> unconditionally flaky on Linux is failing on dwarf.  All the ones I see
>>> succeed are dwo.  I wouldn't expect a diff there but that seems to be the
>>> case.
>>>
>>> So, the request still stands but I won't be surprised if we find that dwo
>>> sometimes passes while dwarf doesn't (or at least not enough to get through
>>> the flakey setting).
>>>
>>> On Sat, Oct 17, 2015 at 4:57 PM, Todd Fiala <todd.fiala at gmail.com> wrote:
>>>>
>>>> Hi Tamas,
>>>>
>>>> I think you grabbed me stats on failing tests in the past.  Can you dig
>>>> up the failure rate for TestRaise.py's test_restart_bug() variants on Ubuntu
>>>> 14.04 x86_64?  I'd like to mark it as flaky on Linux, since it is passing
>>>> most of the time over here.  But I want to see if that's valid across all
>>>> Ubuntu 14.04 x86_64.  (If it is passing some of the time, I'd prefer marking
>>>> it flakey so that we don't see unexpected successes).
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>> -Todd
>>>
>>>
>>>
>>>
>>> --
>>> -Todd
>>
>>
>>
>>
>> --
>> -Todd
>
>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>