[llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner

Tue Jan 11 09:51:21 PST 2022

On 1/11/22 9:45 AM, Pavel Labath wrote:
> On 11/01/2022 18:22, Philip Reames wrote:
>>
>> On 1/11/22 3:32 AM, Pavel Labath wrote:
>>> I am afraid I too have to say that I believe the real problem here 
>>> is the lack active developers with interest in/commitment to the 
>>> windows port of lldb. While I appreciate having Stella's windows 
>>> buildbot around, and it prevents windows from bitrotting completely, 
>>> it would take a much more active involvement to resolve the 
>>> multitude of systemic issues affecting windows support. Like, if we 
>>> tried to apply the current llvm support policy guidelines to the 
>>> windows (host-side, at least) support code, I don't think it would 
>>> even meet the criteria for inclusion in the peripheral tier (active 
>>> sub-community).
>>>
>>> Now for something slightly more constructive:
>>>
>>> While I am not familiar with the windows-specific parts of the 
>>> watchpoint code, I think I can say without exaggerating that I have 
>>> a *lot* of experience in fixing flaky tests. That experience tells 
>>> me that flaky watchpoint tests are often/usually caused by factors 
>>> outside lldb.  (due to watchpoints being a global, scarce, hardware 
>>> resource). Virtualization is particularly tricky here -- every 
>>> virtualization technology that I've tried has had (at some point in 
>>> time at least) a watchpoint-related bug. The problem described here 
>>> sounds a lot like the issue I observed on Google Compute Engine, 
>>> which could also miss some watchpoints "randomly". So, if this bot 
>>> is running in any kind of a virtualized environment, the first thing 
>>> I'd do is check whether the issue happens on physical hardware.
>>>
>>> Relatedly to that, I also want to mention that we also have the 
>>> ability to skip categories of tests in lldb. All the watchpoint 
>>> tests are (should be) annotated by the watchpoint category, and so 
>>> you can easily skip all of them, either by hard-disabling the 
>>> category for windows in the source code (if this is an lldb issue) 
>>> or externally through the buildbot config (if this is due to the bot 
>>> environment => LLDB_TEST_USER_ARGS="--skip-category watchpoint").
>>
>> Would it be reasonable to recommend that all of our windows bots 
>> testing lldb add this flag?  Or maybe even check something in so that 
>> all builds default to not running these tests on Windows? The former 
>> would make sense if we primarily think this is virtualization 
>> related, the later if we think it's more likely a code problem.
>>
>
> If that question was meant for me, then my answer is yes. I think 
> those tests should be disabled regardless of the cause. I actually 
> tried to say the same thing, but I may not have succeeded in getting 
> it across. Stella, can you share what kind of environment is that bot 
> running in?
>
>> I noticed last night that we have a couple of other windows bots 
>> which seem to be hitting the same false positives.  Much lower 
>> frequencies, but it does seem this is not specific to the particular 
>> bot.
> Hmm.. do you have a link to those bots or something? Stella's bot is 
> the only windows (lldb) bot I am aware of and I'd be surprised if 
> there were more of them.
I went back and checked.  Turns out I was wrong here.  I had a couple of 
build failures with similar messages, but they were from this bot.