[cfe-dev] Test Suite - Livermore Loops

David Blaikie dblaikie at gmail.com
Mon Jan 7 14:17:40 PST 2013


On Mon, Jan 7, 2013 at 2:05 PM, Daniel Dunbar <daniel at zuster.org> wrote:
>
>
>
> On Mon, Jan 7, 2013 at 1:52 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> On Mon, Jan 7, 2013 at 1:46 PM, Daniel Dunbar <daniel at zuster.org> wrote:
>> >
>> >
>> >
>> > On Mon, Jan 7, 2013 at 1:14 PM, David Blaikie <dblaikie at gmail.com>
>> > wrote:
>> >>
>> >> On Mon, Jan 7, 2013 at 12:58 PM, Daniel Dunbar <daniel at zuster.org>
>> >> wrote:
>> >> > To weigh in here...
>> >> >
>> >> >
>> >> > On Thu, Jan 3, 2013 at 8:15 AM, David Blaikie <dblaikie at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> +Daniel & Michael who work on the LNT infrastructure & might have
>> >> >> some
>> >> >> thoughts on the differences & their merits & motivations.
>> >> >>
>> >> >> On Thu, Jan 3, 2013 at 4:05 AM, Renato Golin
>> >> >> <renato.golin at linaro.org>
>> >> >> wrote:
>> >> >> > David,
>> >> >> >
>> >> >> > I got some more work on the Livermore Loops and I found out that
>> >> >> > the
>> >> >> > issue
>> >> >> > is the difference in the parameters between a single step and a
>> >> >> > multi
>> >> >> > step
>> >> >> > compilation.
>> >> >>
>> >> >> Thanks for the investigation.
>> >> >>
>> >> >> > When you compile "clang kernel06.c" it works fine, but when you
>> >> >> > get
>> >> >> > all
>> >> >> > steps (clang -emit-llvm + llvm-as + opt + llc etc), the defaults
>> >> >> > options
>> >> >> > of
>> >> >> > each and how they interact is showing a bug in the code generated.
>> >> >>
>> >> >> Sounds quite plausible.
>> >> >>
>> >> >> > This difference is due to the fact that I'm running the test-suite
>> >> >> > using
>> >> >> > LNT, while the build bots are running it using Make directly. I'd
>> >> >> > expect
>> >> >> > them both to be the same, but apparently they're quite different
>> >> >> > in
>> >> >> > what
>> >> >> > kind of parameters they use, passes they test and results they
>> >> >> > get.
>> >> >> >
>> >> >> > I think there are two courses of action here:
>> >> >> >
>> >> >> > 1. Identify the issue, isolate the case and create a bug to
>> >> >> > resolve
>> >> >> > later.
>> >> >> > 2. Make sure LNT does exactly what the build bots are doing
>> >> >>
>> >> >> Part of the issue here is whether or not the Make-based execution is
>> >> >> still maintained/valued. I'm getting the impression that the LNT
>> >> >> execution may be already, or be becoming, the standard way to run
>> >> >> the
>> >> >> test suite even when not gathering perf statistics. Michael/Daniel -
>> >> >> is that the case?
>> >> >
>> >> >
>> >> > Well, the distinction isn't really between LNT and non-LNT, its
>> >> > between
>> >> > the
>> >> > TEST=nightly and TEST=simple style supported by the Makefiles. LNT
>> >> > uses
>> >> > the
>> >> > TEST=simple style and that is all I care to support.
>> >>
>> >> Fair enough, though that's sort of what I was getting at in a way:
>> >> whatever way LNT is driving the test-suite is, essentially, the only
>> >> supported way. Sure we can have non-LNT bots (not ideal, perhaps -
>> >> still another path to maintain/possibly diverge by accident) but they
>> >> certainly shouldn't be using anything other than the way LNT uses the
>> >> test-suite (ie: TEST=simple).
>> >>
>> >> Can we kill TEST=nightly, then, since it's just an
>> >> untested/unsupported trap? Or do you know of users that have a need
>> >> for this?
>> >
>> >
>> > It's untested, but as supported as anything else (I try not to break it,
>> > and
>> > will fix bugs in it).
>> >
>> > And yes, there are still users that use this regularly. Most of that is
>> > probably habit among old-school LLVMers, but it's still useful when you
>> > want
>> > to do direct A/B testing of optimizer changes (support for things like
>> > OPTBETA and LLCBETA), or when you want to test a change without
>> > requiring a
>> > compiler rebuild.
>> >
>> > For example, we still don't have very good support in the compiler for
>> > tweaking various parts of the compilation process (for example, running
>> > with
>> > a custom pass list), so the easiest way to test addition of a new pass
>> > may
>> > still be using TEST=nightly.
>> >
>> > My natural tendency is towards "if it isn't broke, don't kill it", and
>> > not
>> > to try and remove it until we have a new separate way of running the
>> > test
>> > suite outside of the Makefiles.
>> >
>> >>
>> >> >
>> >> > Historically, the old way of testing (TEST=nightly) used the various
>> >> > LLVM
>> >> > tools to effect a compilation because there weren't compilers that
>> >> > worked.
>> >> > However, this is a bad way to "test" the product that most users
>> >> > actually
>> >> > care about, which is the compiler.
>> >> >
>> >> > With TEST=simple, all the compilation is done using the compiler just
>> >> > as
>> >> > an
>> >> > end user would. If you want LTO, the right way to get it is to use
>> >> > the
>> >> > compilers support for LTO. This is how we test LTO internally. I've
>> >> > never
>> >> > tried to get LTO working on Linux, but it should be possible using
>> >> > the
>> >> > gold
>> >> > plugin and passing the right compiler options.
>> >> >
>> >> >> If so, should we rip out the direct Make execution, or do something
>> >> >> to
>> >> >> otherwise warn/disable it?
>> >> >
>> >> >
>> >> > Per my other thread polling users of the test-suite, there are still
>> >> > people
>> >> > who use the Makefiles to do more custom things. I personally would
>> >> > love
>> >> > to
>> >> > deprecate them completely, but they do support some useful workflows.
>> >> >
>> >> > My ideal would be:
>> >> > 1. Migrate LNT to drive the test-suite using a more sane mechanism
>> >> > (not
>> >> > a
>> >> > glob of Makefiles). I would like the "more sane mechanism" to be
>> >> > lit-based.
>> >> > 2. Maybe do some work to make using lit to drive the test-suite more
>> >> > convenient and hopefully support some of the useful workflows the
>> >> > Makefiles
>> >> > support with less of the crap.
>> >> > 3. Deprecate the Makefiles, or at least let the die through lack of
>> >> > maintenance.
>> >> >
>> >> > Does that answer the parts you wanted my input on?
>> >>
>> >> More or less, I suppose I wouldn't mind an opinion on the "should we
>> >> kill off/migrate bots from test-suite invocation to LNT?" issue too.
>> >> (my assumption is that your answer to that is "yes", but just want to
>> >> be clear)
>> >
>> >
>> > Yes, definitely.
>>
>> Hmm, this seems at odds with your above opinion on not killing
>> TEST=nightly, though. If we actively migrate bots away from
>> TEST=nightly we're going to break it (indeed we Renato already has
>> which is how this thread came up).
>
>
> By "not breaking it", I meant the infrastructure of it, not whether or not
> the tests work or not.
>
> As for the actual LLVM bug, we should probably try and get an LTO LNT bot
> up, though, which would most likely hit the same bug.
>
>> If it's broken that, I would think,
>> is going to cause some confusion/problems for those using it &
>> expecting things to pass. Is your impression/experience that those
>> running this manually for custom testing aren't too concerned about
>> spurious failures?
>
>
> In my mind, I don't necessarily think "its broken" makes sense in this
> context. It's not TEST=nightly that is broken, it is the compiler in the
> context of a one architecture, one set of compile options, etc.

Agreed. There's a distinction between the infrastructure being broken
& the tests not passing.

> I expect/hope developers to be aware that compiler bugs may only manifest
> under a very specific set of circumstances, and so they need to run their
> tests in the same way as the buildbots if they want results to match. And I
> hope most core LLVM developers realize that the way TEST=nightly ends up
> building binaries is very different from using the compiler directly, but if
> not most of them figure this out very quickly in practice.

Right, my concern is that if we leave "TEST=nightly" in it'll just be
a trap for people to run that & expect to get clean results when
there's no infrastructure ensuring that those tests pass at all on any
architecture. It'll easily become a hive of arbitrary failures & no
clear way to distinguish new failures from old (which will result in
people either not using it or trying to investigate issues/failures
that weren't introduced by their change anyway (or having to run it
twice - once to get a baseline and again to get their results &
carefully diffing between the two to see where the new failures are))

This seems likely to waste engineer time & removing the option would
remove the pitfall/trap. But perhaps people using it are used to
arbitrary failures? I'm not sure.

> I see LNT + TEST=simple as the "right way" to do large scale testing w/ the
> test suite and buildbots, but if developers want to use TEST=nightly for
> experiments or development its still there. I have actively tried to
> encourage people to switch over to using TEST=simple when possible, but its
> hard to get people to change existing workflows if there isn't a clear
> benefit.
>
>  - Daniel
>
>>
>>
>> - David
>>
>> >
>> >  - Daniel
>> >
>> >>
>> >>
>> >> - David
>> >>
>> >> >
>> >> >  - Daniel
>> >> >
>> >> >>
>> >> >> > I'm working on item 1 right now, not sure how item 2 can be
>> >> >> > solved...
>> >> >> >
>> >> >> > Of course, the fact that it's the not same flow meant we caught a
>> >> >> > bug
>> >> >> > in
>> >> >> > LLVM, but that's bound to create more confusion and broken
>> >> >> > commits,
>> >> >> > which is
>> >> >> > worse in the long run.
>> >> >>
>> >> >> Yeah, unless there's some strong/specific motivation for this I'd be
>> >> >> in favor of removing the difference (or removing the Make-based
>> >> >> execution entirely)
>> >> >>
>> >> >> > Also, if we're not running LNT as often as buildbots, the benefit
>> >> >> > of
>> >> >> > having
>> >> >> > them different is sporadic at best.
>> >> >>
>> >> >> we're running both pretty regularly, I think - if anything I suspect
>> >> >> we might be running LNT on more configurations than the Make-based
>> >> >> execution (except that on some LNT runners we're multisampling, so
>> >> >> it's slower)
>> >> >>
>> >> >> > When I set up some tests to run on ARM I have done both direct and
>> >> >> > multi-step, to make sure they were generating the same code and in
>> >> >> > many
>> >> >> > cases I found that the order in which the passes were executed was
>> >> >> > breaking
>> >> >> > some tests.
>> >> >> >
>> >> >> > We managed to get the EDG bridge to set it up in the same way as
>> >> >> > the
>> >> >> > multi-pass would, so we would get similar results, but it doesn't
>> >> >> > seem
>> >> >> > to be
>> >> >> > the case with clang.
>> >> >> >
>> >> >> > cheers,
>> >> >> > --renato
>> >> >
>> >> >
>> >
>> >
>
>



More information about the cfe-dev mailing list