[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Thu Sep 3 11:24:59 PDT 2020


Oh yeah, good catch! Thanks!

On Thu, Sep 3, 2020 at 11:13 AM Fāng-ruì Sòng <maskray at google.com> wrote:

> This is likely due to a race condition (%T is a shared parent
> directory). I'll put up a patch to fix it.
>
> On Thu, Sep 3, 2020 at 10:00 AM David Blaikie via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Is the machine running any jobs in parallel? Would it be worth trying
> running lit in the loop, rather than the script? (perhaps lit's doing
> something interesting) or maybe the full test run from ninja, but I
> appreciate that that is expensive.
> >
> > Are there other PPC bots? Any idea if they are experiencing this failure?
> >
> > There are also other tests that do similar mkdir/symlink things, I think
> - yet they are not failing? Maybe they do it in some slightly different
> manner?
> >
> > On Thu, Sep 3, 2020 at 5:03 AM Nemanja Ivanovic <nemanja.i.ibm at gmail.com>
> wrote:
> >>
> >> Sure.
> >> I didn't use lit or ninja. I simply copied the script produced by lit
> (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/tools/clang/test/Driver/Output/target-override.c.script)
> into a temporary directory (along with a deep copy of the build directory).
> I modified the paths in the script to point to the temporary directory.
> >> Then I ran the script in a loop.
> >> For running a bunch in parallel, I just produced a wrapper script to
> invoke that one:
> >> target-override.c.script $LINENO &
> >> target-override.c.script $LINENO &
> >> target-override.c.script $LINENO &
> >> ...
> >> wait
> >> And ran that in a loop. For thousands of iterations...
> >>
> >> On Wed, Sep 2, 2020 at 3:51 PM David Blaikie <dblaikie at gmail.com>
> wrote:
> >>>
> >>> Thanks for looking into it!
> >>>
> >>> Could you describe your test process in more detail? Were you running
> lit from your script? Running the build system (ninja?)?
> >>>
> >>> On Wed, Sep 2, 2020 at 10:47 AM Nemanja Ivanovic <
> nemanja.i.ibm at gmail.com> wrote:
> >>>>
> >>>> Well, I am at my wit's end. I have copied over the script and
> directories for this test case and run it a few million times. First I was
> running one at a time, then I switched to kicking off 1000 at a time. All
> the while, the bots continued to run on the same machine. The script never
> failed even once. I am not sure if this has something to do with Python as
> part of llvm-lit or what is going on.
> >>>> I am thinking that the best course of action for us is to mark this
> test case UNSUPPORTED for PPC.
> >>>>
> >>>> On Wed, Sep 2, 2020 at 12:41 PM Nemanja Ivanovic via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>>>>
> >>>>> Interesting, thanks for bringing this to our attention. I just took
> a quick look through the last 100 builds and this test has failed 13 times.
> This is certainly something we need to look at. We will investigate and see
> if we can make any sense of this.
> >>>>>
> >>>>> Nemanja Ivanovic
> >>>>> LLVM PPC Backend Development
> >>>>> IBM Toronto Lab
> >>>>> Email: nemanjai at ca.ibm.com
> >>>>> Phone: 905-413-3388
> >>>>>
> >>>>>
> >>>>>
> >>>>> ----- Original message -----
> >>>>> From: David Blaikie <dblaikie at gmail.com>
> >>>>> To: llvm-dev <llvm-dev at lists.llvm.org>, Nico Weber <
> thakis at chromium.org>, Serge Pavlov <sepavloff at gmail.com>,
> powerllvm at ca.ibm.com
> >>>>> Cc:
> >>>>> Subject: [EXTERNAL] Flakey failure on clang-ppc64le-linux-multistage
> >>>>> Date: Tue, Sep 1, 2020 6:10 PM
> >>>>>
> >>>>> Seems there were a couple of correlated failures that appear to be
> flakes on this buildbot recently:
> >>>>>
> >>>>> green:
> http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974
> >>>>> red:
> http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975
> (target-override.c during stage 1, seems to be missing the
> directory/symlink it just created)
> >>>>> red:
> http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976
> (same test failure as the last, but during stage 2, not stage 1)
> >>>>> green:
> http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977
> >>>>>
> >>>>> Including Nico & Pavlov as the people who wrote/edited the test, but
> I'm guessing this is something interesting going on on the buildbot itself?
> >>>>>
> >>>>> powerllvm at ca.ibm.com, whoever you are on the end of that mailing
> list - could you take a look at this? Possibly manually running that test
> in a loop a bunch of times to see if it fails sometimes & try to help us
> understand why?
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> LLVM Developers mailing list
> >>>>> llvm-dev at lists.llvm.org
> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> --
> 宋方睿
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/84b340a7/attachment.html>


More information about the llvm-dev mailing list