[LLVMdev] [cfe-dev] RFC: Change tests to run with fixed (not-host dependent) triple

Sat Dec 1 09:34:00 PST 2012

On Sat, Dec 1, 2012 at 5:06 AM, David Tweed <david.tweed at gmail.com> wrote:
> My thoughts:
>
> On Sat, Dec 1, 2012 at 9:06 AM, Chandler Carruth <chandlerc at google.com>
> wrote:
>>
>> On Fri, Nov 30, 2012 at 8:04 PM, Chris Lattner <clattner at apple.com> wrote:
>>>
>>> I'm ok with this in principle, but how about with the nuance that some
>>> tests (eg test/codegen) explicitly opt into march=native?
>>
>>
>> I'd really like the default behavior to be something that forces the test
>> to either be independent of the targeted triple, or explicitly set a target.
>> I like the default being unknown.
>
> To state the obvious, there's two different things that go on in tests: the
> specific thing being tested and things that aren't being tested but just
> need to be done to provide enough "context" for what's being tested. I was
> involved in a long slog trying to fix up a lot of the ARM regression test
> failues (using my work email address). Here's some "roughly right"
> statistics:
>
> There were probably about 25--30 bugs where the issue was behaviour that
> FileCheck regexps didn't account for (mostly due to ABI issues).

These were mostly in Clang tests, right?

I expect Clang IRGen tests will generally be more ABI-dependent, but
we can probably still find some number of tests there that don't
specify triples & pass on both ARM and Itanium ABIs which shows some
degree of portability.

For backend tests I expect many IR pass tests would be able to be
written in a fairly target-neutral fashion (though admittedly I
haven't worked much below Clang so my intuition here is not very
strong)

> There the
> prevailing opinion seemed to be keep simple FileCheck tests but that tests
> should be run with a specific triple; however that triple shouldn't always
> be x86_64 (because that's a bit special). There've been about 5-10 tests
> where the test was testing something that was architecture specific without
> secifying they needed it (eg, testing for specific x86_64 machine
> optimizations without doing that); again the upshot has been to have these
> requirements specified explicitly. There have been 5-10 JIT code tests where
> "support" code pasted from, eg, lli into a test hadn't been updated when the
> JIT core was changed. The one definite bug that was there was in
> devirtualisation (which probably lowered ok but failed the module verifier).
> This bug was definitely not visible due to the general set of ARM failures
> that were basically issues with the tests.
>
> So on the one hand, I'd love it if the tests were constructed in such a way
> that the "fuzz" of ABI differences didn't need to be considered. On the
> other hand, if the devirtualization test had been run only using an x86_64
> triple the issue wouldn't have come to light as quickly.

This would be exposed by Chandler's suggestion of having the ability
to run target-unspecified tests with a range of (up to & including
"all targets") triples.

I think this gives the best of both worlds, really. Developers would
be expected to run the basic test suite (using the default/agnostic
triple) and have a consistent experience/expectation that they pass.
They could optionally run with specific target triples or "all
targets" which would take longer but provide more confidence when
making a test or change that they're concerned might be
target-dependent. And bots would be responsible for testing the "all
targets" case regularly (& as is the case today with other bot
workloads, we don't necessarily expect developers to always run all
these heavier workloads so when they break it's "no harm, no foul":
the developer is notified & fixes up the test based on bot results).
We don't unduly block any development from happening by accidentally
introducing a test break for Apple developers while developing on
Linux, for example.

> That seems to me to
> be the crux of the problem: LLVM (and especially Clang) is only _mostly_
> target independent, and getting the smaller set of target dependent elements
> wrong breaks compilation just as much as a generic bug so finding these
> things as early as possible seems desirable.
>
>>
>> I wonder, would the ability to run the entire test suite with all of the
>> 'default' triples (that lit sets to unknown in normal runs) instead set to
>> the host, or to a specific triple maybe be a useful extra form of checking?
>> This would let both humans and build bots find bugs and discrepancies
>> specific to a particular target.
>
> Dreaming here: I wonder if one could come up with some set of meta-regexps
> that describe the annoying stuff like ABI differences so that if the above
> were done, one could try t separate the test failures into "test regreps
> written implicitly assuming different system, failures look due to correct
> system dependent stuff being generated" and "test is failing on this system
> in a non-understood way". That sounds too tricky to be reliable, but I don't
> know...
>
>>
>> We could even have a common test target that build bots use which runs all
>> the tests both in the default, and in the host-triple mode so that we force
>> people to converge on target independent tests or explicit triples.
>
> A reasonable idea, except I'll reiterate: it assumes there _is_ completely
> target independent behaviour in non-trivial test code. If it's the case that
> there's not really it might be a situation where biting the bullet and
> trying to put a wide ranging set of triples randomly throughout the test
> suite and hoping to catch stuff that way is the best idea.
>>
>> --
>
> cheers, dave tweed__________________________
> high-performance computing and machine vision expert: david.tweed at gmail.com
> "while having code so boring anyone can maintain it, use Python." --
> attempted insult seen on slashdot
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>