[llvm-dev] Testing "normal" cross-compilers versus GPU backends

Fri Sep 4 09:28:01 PDT 2015

> On Sep 3, 2015, at 5:56 PM, Robinson, Paul <Paul_Robinson at playstation.sony.com> wrote:
> 
> 
> 
>> -----Original Message-----
>> From: Mehdi Amini [mailto:mehdi.amini at apple.com]
>> Sent: Thursday, September 03, 2015 3:26 PM
>> To: Robinson, Paul
>> Cc: Tom Stellard; llvm-dev at lists.llvm.org; NAKAMURA Takumi
>> Subject: Re: Testing "normal" cross-compilers versus GPU backends
>> 
>> 
>>> On Sep 3, 2015, at 11:23 AM, Robinson, Paul
>> <Paul_Robinson at playstation.sony.com> wrote:
>>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Tom Stellard [mailto:tom at stellard.net]
>>>> Sent: Thursday, September 03, 2015 7:31 AM
>>>> To: Mehdi Amini
>>>> Cc: Robinson, Paul; llvm-dev at lists.llvm.org; NAKAMURA Takumi
>>>> Subject: Re: Testing "normal" cross-compilers versus GPU backends
>>>> 
>>>> On Thu, Sep 03, 2015 at 02:07:54AM -0700, Mehdi Amini wrote:
>>>>> 
>>>>>> On Sep 3, 2015, at 12:18 AM, Robinson, Paul
>>>> <Paul_Robinson at playstation.sony.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Mehdi Amini [mailto:mehdi.amini at apple.com]
>>>>>>> Sent: Wednesday, September 02, 2015 7:10 PM
>>>>>>> To: Robinson, Paul
>>>>>>> Cc: llvm-dev at lists.llvm.org; tom at stellard.net; NAKAMURA Takumi
>>>>>>> Subject: Re: Testing "normal" cross-compilers versus GPU backends
>>>>>>> 
>>>>>>> Hi Paul,
>>>>>>> 
>>>>>>> Thanks for the summary!
>>>>>>> 
>>>>>>>> On Sep 2, 2015, at 5:44 PM, Robinson, Paul
>>>>>>> <Paul_Robinson at playstation.sony.com> wrote:
>>>>>>>> 
>>>>>>>> This note arose from
>>>> https://urldefense.proofpoint.com/v2/url?u=http-
>>>>>>> 3A__reviews.llvm.org_D12506&d=BQIFAg&c=eEvniauFctOgLOKGJOplqw&r=v-
>>>>>>> 
>>>> 
>> ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=Wr0uOhkAp_10X4edWwxZQ9V8L97j8e
>>>>>>> o6cR_1Ia-gMOw&s=OOTP9DnL-TWV1zvy9EcU0Z6yfTq5lBjhE-LvYlWMJ3Y&e=  but
>>>> the
>>>>>>> reviewers
>>>>>>>> felt that we needed a broader audience, because the proposed patch
>>>>>>>> really didn't solve the entire problem and we had no better ideas.
>>>>>>>> 
>>>>>>>> Mehdi Amini needs to build LLVM with just a GPU backend, and still
>>>> have
>>>>>>>> "ninja check" Just Work.  Commits r243958-243960 tried to
>> accomplish
>>>>>>>> that; however they are too big a hammer, and cause much simpler
>>>> cross
>>>>>>>> environments (like mine) to exclude big chunks of very useful tests
>>>>>>>> (including my favorite, DebugInfo).
>>>>>>>> 
>>>>>>>> FYI, my main cross environment is building on X86 Windows but using
>>>>>>>> a default target triple for PS4 (which is also X86).
>>>>>>>> 
>>>>>>>> I experimented with building LLVM with just the ARM backend
>> (running
>>>> on
>>>>>>>> an X86 workstation) and setting the default triple to some ARM
>>>> value.
>>>>>>>> "ninja check" worked fine (without Mehdi's series of commits), so
>>>> the
>>>>>>>> normal kind of cross-compiler environment seems to be happy with
>> how
>>>>>>>> things were set up originally.
>>>>>>>> 
>>>>>>>> Mehdi reports building LLVM with the X86 and AMDGPU backends,
>>>> setting
>>>>>>>> the default triple to "amdgcn--amdhsa", and getting 200-some
>>>> failures.
>>>>>>>> 
>>>>>>>> (This does make me wonder about AMDGPU testing in general; how does
>>>> that
>>>>>>>> work?  The only places I see lit checks for AMDGPU are in the usual
>>>>>>>> target-dependent places.)
>>>>>>> 
>>>>>>> I don’t understand this interrogation about how do you do testing in
>>>>>>> general. The same way you don’t process tests/CodeGen/X86/* with the
>>>> ARM
>>>>>>> backend, you can’t process any random IR through these backends.
>>>>>> 
>>>>>> You said you had 200+ failures with AMDGPU.  Are the AMD folks simply
>>>>>> tolerating the 200 failures, and you don't want to?  I should hope
>>>> there
>>>>>> is more to it than that.
>>>>> 
>>>>> Well, I don’t know, they might just run `ninja check` with the default
>>>> triple set to X86?
>>>>> (which I would consider being working around a buggy test suite)
>>>>> 
>>>> 
>>>> I always enable AMDGPU and X86 when I build, so I've never run into
>>>> this problem.
>>>> 
>>>> -Tom
>>> 
>>> Tom, presumably your default target triple is X86-ish?  And the only
>>> tests to exercise the AMDGPU backend are those that explicitly specify
>>> a triple for AMDGPU?
>> 
>> This is how we used to do it as well, and I assume this is how most of
>> backends out of the main CPUs are dealing with the situation.
>> This is what I consider a bug and was trying to solve (admittedly not in
>> an optimal way).
>> 
>>> Mehdi, assuming that's what Tom does, your stated goal was to be able to
>>> run tests *without* including the X86 backend, so Tom's solution won't
>>> work for you (restating just for confirmation).
>> 
>> Yes, or alternatively I expect the test suite to pass whatever the default
>> triple is.
> 
> Um, that may be asking a bit much…

It already works for >95% of the suite :)
The rest is working (and useful) only for “some” (unspecified) cross compiler configuration.

> 
>> 
>>> Krzysztof suggested much the same thing that I think you are currently
>>> doing, which is deliberately configure a default triple but exclude the
>>> corresponding backend.
>> 
>> You and Takumi were considering this as an unsupported configuration
>> before, and I tend to agree with that (this is the configuration I’m using
>> for our tests but it was not intentional to leave the default triple
>> unset).
> 
> Right, intuitively it doesn't make sense.  Is it actually useful to build a 
> GPU compiler that will crash unless you ask it to generate GPU code? Seems
> to me it should default to producing GPU code.

Correct me if I’m wrong:

You’re viewing this from the “clang” point of view. A default triple is needed because the command line interface does not require to specify it.

I see LLVM as a library or compiler *framework* in the first place, and clang is just a use case as another. 

When you build a compiler using LLVM as a library: 1) it does not have to be a command line compiler, 2) the interface does not have to make optional the target selection

Most GPU compilers are embedded in the driver (they compile shaders on-demand during host program execution). The driver can detect the hardware and initialize LLVM with the right triple.

We build LLVM as a shared library, we then build multiple compiler that will link to this library to CodeGen to various backend. The compiler is responsible to select and initialize the appropriate backend, we *never* rely on the default triple, and I don’t even see how we could.

You could also see LLVM as a system library that can have multiple clients, each client responsible of its own initialization.

>>> I expect we can detect that situation in lit.cfg
>>> and exclude tests on that basis, rather than 'native'.  It would solve
>>> the problem for my case (host triple != target triple, although the arch
>>> parts of the triple do match) and the "normal" cross-compiler case (e.g.
>>> host = X86, backend + target triple = ARM).
>>> 
>>> I'm going to play around with that and see what I can do to make it
>> work.
>>> 
>>>> 
>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> IMO, the problem is in general about tests that are written without
>>>>>>> specifying a triple, that will be executed with the default triple.
>>>>>>> 
>>>>>>> Most of these tests were written with X86 (or ARM) in mind, and
>> there
>>>> is
>>>>>>> no guarantee that they will behave as intended with every possible
>>>> triple.
>>>>>>> The DataLayout for instance has to be the one from the target, and
>> is
>>>> not
>>>>>>> portable.
>>>>>>> I think a "portable backend test” is pretty rare in general.
>>>>>> 
>>>>>> It depends on what the test is trying to do.  I'm sure it is quite
>>>> common
>>>>>> for IR tests to behave essentially the same way regardless of target.
>>>>> 
>>>>> IR tests != backend test (I may miss your point here, it’s late…).
>>> 
>>> Right, sorry, lost focus for a moment there... nevertheless it is still
>>> the case that many tests exercise functionality that is not particularly
>>> target-centric and these should be run for any target that actually
>>> supports that functionality.  For example, the DebugInfo tests should
>>> be run for any target that supports emitting debug info.
>> 
>> I’m not sure that “debug info” support is all or nothing.
>> As an extreme example, I know targets that support debug info but do not
>> support function calls, what if your "debug info” test involves these?
> 
> Then as part of getting the test suite to work for you, you would need to
> disable that particular test for your target.  It sounds like this kind of
> thing is exactly what the Hexagon folks did, and it seems quite reasonable.
> (And in fact I see two DebugInfo tests marked XFAIL: hexagon.)

It seems conceptually wrong to me, for the reason I already exposed.
It should go the other way (whitelist instead of blacklist)

> 
>> 
>> Also, I’m not a DebugInfo expert, but when a front-end generated them,
>> aren’t they dependent on the DataLayout? Hence the target?
> 
> Not really. DebugInfo tests primarily care what the DWARF description
> looks like, not so much what the generated code looks like,

My question is less will the CHECK matches than “will the backend be able to generate code with invalid debug information (ex: pointer size, etc.) or just crash?”

> and there
> are 100 or so DebugInfo tests that work across lots of targets.
> 
> Many DebugInfo tests are in target-dependent directories as artifacts of
> how the tests are implemented, rather than because they are truly
> target-dependent. The ones I've tripped over tend to be target-specific
> because they generate assembler and the assembler syntax varies too much
> across targets to be worth making the CHECKs tolerant enough.
> 
> There are fine details that would be DataLayout dependent, but 99% of
> the DebugInfo tests aren't checking things that are at such a low level.
> 
>> It means that many DebugInfo test could fail with a backend that would
>> have a significantly different DataLayout and won’t expect some stuff the
>> way they are played down.
>> 
>> 
>>> Whether a
>>> target supports debug info is orthogonal to its native-ness. As written
>>> the top-level DebugInfo tests should work fine as long as the default
>>> triple's backend is included, and that backend supports debug info.
>>> 
>>> If your backend doesn't support debug info, then it's reasonable to
>>> exclude those from your testing; but we can't do that at the cost of
>>> excluding those from testing other backends that *do* support the
>> feature,
>>> even if that testing runs in a cross-compiler environment.
>>> 
>>> In this particular example, we'd be setting things up so that DebugInfo
>>> is excluded for the wrong reason (not based on some more abstract idea
>>> of the feature-set of the target) but as Krzysztof observes, getting
>>> a feature-oriented set of conditions would be a huge task.
>> 
>> Agree: the predicate I wrote is not correct, and I don’t see a trivial
>> “good” solution. This is why any compromise or intermediate solution that
>> is good enough to support whatever use-case people here have, including
>> your, seems appropriate to implement.
>> 
>> Let me know I can help coding something.
> 
> I have experimented with implementing the thing Takumi and I think should
> be a configuration error. :-)  Basically it takes the same kind of approach
> that I did in D12506, except it checks for the existence of the target that
> matches the default triple. If that target exists then 'llc' with no triple
> will succeed, and it looks like the bulk of the tests that you disabled are
> in that category.  I'm not especially happy about this tactic, though.

Why aren’t you happy about that?

> 
> The Hexagon precedent is interesting; Krzysztof said they set the default
> triple, and didn't have to xfail all that much stuff.  Searching the tree,
> I find exactly 7 individual tests marked XFAIL: hexagon, plus it disables 
> all of ExecutionEngine, and turns off the 'object-emission' feature.
> 
> I'm curious if you would try setting the default triple to match your
> target, and see what /kinds/ of tests fail.  The raw number is much less
> interesting than in the categories.

Failing tests attached, let me know which ones you’d like me to investigate.

— 
Mehdi