[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage

Nico Weber via llvm-dev llvm-dev at lists.llvm.org
Thu Sep 3 13:15:57 PDT 2020


https://llvm.org/docs/CommandGuide/lit.html already lists %T as "parent
directory of %t (not unique, deprecated, do not use)". See also
https://reviews.llvm.org/D35396

On Thu, Sep 3, 2020 at 3:37 PM David Blaikie <dblaikie at gmail.com> wrote:

> Yeah, I think I'd be up for considering deprecation of %T due to the risk
> of race conditions/conflicts between tests. %t gives a unique name you can
> do whatever you want with - only need one file, use %t as a file, need a
> directory full of files, mkdir %t and use that, etc.
>
> But will depend a bit on what the uses of %T look like, maybe there are
> some good uses of it that we haven't thought of until we see them.
>
> On Thu, Sep 3, 2020 at 12:33 PM Fāng-ruì Sòng <maskray at google.com> wrote:
>
>> Should be fixed by https://reviews.llvm.org/D87103
>>
>> Shall we consider deprecating(emitting a warning)/removing %T from
>> lit? lldb, lld/COFF and clang-tools-extra are the three major users of
>> %T. There are a few other %T in other places but there are not too
>> many. We will also investigate whether other projects using lit are
>> using %T.
>>
>> On Thu, Sep 3, 2020 at 11:25 AM David Blaikie <dblaikie at gmail.com> wrote:
>> >
>> > Oh yeah, good catch! Thanks!
>> >
>> > On Thu, Sep 3, 2020 at 11:13 AM Fāng-ruì Sòng <maskray at google.com>
>> wrote:
>> >>
>> >> This is likely due to a race condition (%T is a shared parent
>> >> directory). I'll put up a patch to fix it.
>> >>
>> >> On Thu, Sep 3, 2020 at 10:00 AM David Blaikie via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >> >
>> >> > Is the machine running any jobs in parallel? Would it be worth
>> trying running lit in the loop, rather than the script? (perhaps lit's
>> doing something interesting) or maybe the full test run from ninja, but I
>> appreciate that that is expensive.
>> >> >
>> >> > Are there other PPC bots? Any idea if they are experiencing this
>> failure?
>> >> >
>> >> > There are also other tests that do similar mkdir/symlink things, I
>> think - yet they are not failing? Maybe they do it in some slightly
>> different manner?
>> >> >
>> >> > On Thu, Sep 3, 2020 at 5:03 AM Nemanja Ivanovic <
>> nemanja.i.ibm at gmail.com> wrote:
>> >> >>
>> >> >> Sure.
>> >> >> I didn't use lit or ninja. I simply copied the script produced by
>> lit
>> (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/tools/clang/test/Driver/Output/target-override.c.script)
>> into a temporary directory (along with a deep copy of the build directory).
>> I modified the paths in the script to point to the temporary directory.
>> >> >> Then I ran the script in a loop.
>> >> >> For running a bunch in parallel, I just produced a wrapper script
>> to invoke that one:
>> >> >> target-override.c.script $LINENO &
>> >> >> target-override.c.script $LINENO &
>> >> >> target-override.c.script $LINENO &
>> >> >> ...
>> >> >> wait
>> >> >> And ran that in a loop. For thousands of iterations...
>> >> >>
>> >> >> On Wed, Sep 2, 2020 at 3:51 PM David Blaikie <dblaikie at gmail.com>
>> wrote:
>> >> >>>
>> >> >>> Thanks for looking into it!
>> >> >>>
>> >> >>> Could you describe your test process in more detail? Were you
>> running lit from your script? Running the build system (ninja?)?
>> >> >>>
>> >> >>> On Wed, Sep 2, 2020 at 10:47 AM Nemanja Ivanovic <
>> nemanja.i.ibm at gmail.com> wrote:
>> >> >>>>
>> >> >>>> Well, I am at my wit's end. I have copied over the script and
>> directories for this test case and run it a few million times. First I was
>> running one at a time, then I switched to kicking off 1000 at a time. All
>> the while, the bots continued to run on the same machine. The script never
>> failed even once. I am not sure if this has something to do with Python as
>> part of llvm-lit or what is going on.
>> >> >>>> I am thinking that the best course of action for us is to mark
>> this test case UNSUPPORTED for PPC.
>> >> >>>>
>> >> >>>> On Wed, Sep 2, 2020 at 12:41 PM Nemanja Ivanovic via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >> >>>>>
>> >> >>>>> Interesting, thanks for bringing this to our attention. I just
>> took a quick look through the last 100 builds and this test has failed 13
>> times. This is certainly something we need to look at. We will investigate
>> and see if we can make any sense of this.
>> >> >>>>>
>> >> >>>>> Nemanja Ivanovic
>> >> >>>>> LLVM PPC Backend Development
>> >> >>>>> IBM Toronto Lab
>> >> >>>>> Email: nemanjai at ca.ibm.com
>> >> >>>>> Phone: 905-413-3388 <(905)%20413-3388>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> ----- Original message -----
>> >> >>>>> From: David Blaikie <dblaikie at gmail.com>
>> >> >>>>> To: llvm-dev <llvm-dev at lists.llvm.org>, Nico Weber <
>> thakis at chromium.org>, Serge Pavlov <sepavloff at gmail.com>,
>> powerllvm at ca.ibm.com
>> >> >>>>> Cc:
>> >> >>>>> Subject: [EXTERNAL] Flakey failure on
>> clang-ppc64le-linux-multistage
>> >> >>>>> Date: Tue, Sep 1, 2020 6:10 PM
>> >> >>>>>
>> >> >>>>> Seems there were a couple of correlated failures that appear to
>> be flakes on this buildbot recently:
>> >> >>>>>
>> >> >>>>> green:
>> http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974
>> >> >>>>> red:
>> http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975
>> (target-override.c during stage 1, seems to be missing the
>> directory/symlink it just created)
>> >> >>>>> red:
>> http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976
>> (same test failure as the last, but during stage 2, not stage 1)
>> >> >>>>> green:
>> http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977
>> >> >>>>>
>> >> >>>>> Including Nico & Pavlov as the people who wrote/edited the test,
>> but I'm guessing this is something interesting going on on the buildbot
>> itself?
>> >> >>>>>
>> >> >>>>> powerllvm at ca.ibm.com, whoever you are on the end of that
>> mailing list - could you take a look at this? Possibly manually running
>> that test in a loop a bunch of times to see if it fails sometimes & try to
>> help us understand why?
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> _______________________________________________
>> >> >>>>> LLVM Developers mailing list
>> >> >>>>> llvm-dev at lists.llvm.org
>> >> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> >
>> >> > _______________________________________________
>> >> > LLVM Developers mailing list
>> >> > llvm-dev at lists.llvm.org
>> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >>
>> >> --
>> >> 宋方睿
>>
>>
>>
>> --
>> 宋方睿
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/01acc584/attachment.html>


More information about the llvm-dev mailing list