[PATCH] D22840: [InstCombine] Split int64 store into separate int32 stores

Wed Jul 27 09:17:34 PDT 2016

Yes -- this is probably worth a discussion in a different thread.

It is not wrong for FE to emit target dependent IR. The question is
not about that. The problem is that if it is also perfectly legal for
FE to emit the call sequence in a  target independent way, but LLVM
middle/backend fails to lower them properly in target specific way, it
is a bug in LLVM.

thanks,

David

On Wed, Jul 27, 2016 at 12:48 AM, Chandler Carruth <chandlerc at google.com> wrote:
> On Wed, Jul 27, 2016 at 12:39 AM Xinliang David Li <davidxl at google.com>
> wrote:
>>
>> On Wed, Jul 27, 2016 at 12:24 AM, Chandler Carruth <chandlerc at google.com>
>> wrote:
>>>
>>> On Wed, Jul 27, 2016 at 12:15 AM Xinliang David Li <davidxl at google.com>
>>> wrote:
>>>>
>>>> On Tue, Jul 26, 2016 at 11:27 PM, Chandler Carruth
>>>> <chandlerc at google.com> wrote:
>>>>>
>>>>> On Tue, Jul 26, 2016 at 11:19 PM Xinliang David Li <davidxl at google.com>
>>>>> wrote:
>>>>>>
>>>>>> On Tue, Jul 26, 2016 at 11:11 PM, Chandler Carruth
>>>>>> <chandlerc at google.com> wrote:
>>>>>>>
>>>>>>> On Tue, Jul 26, 2016 at 11:08 PM Xinliang David Li via llvm-commits
>>>>>>> <llvm-commits at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>> Hmm, why does Clang change the return type from from struct to i64?
>>>>>>>> Set aside whether it is always beneficial to do it or not, it certainly
>>>>>>>> feels like the wrong place to make such a change.
>>>>>>>
>>>>>>>
>>>>>>> It's necessary to get the right ABI out of LLVM.
>>>>>>>
>>>>>>
>>>>>> what ABI issue and what can be wrong without using i64? I might have
>>>>>> missed something obvious here.
>>>>>
>>>>>
>>>>> The ABI specifies that a struct such as this is returned in a single
>>>>> 64-bit integer register.
>>>>
>>>>
>>>> yes, at least for system V ABI for x86_64.
>>>>
>>>>>
>>>>>
>>>>> Consider this C++ test case:
>>>>> long long a();
>>>>>
>>>>> struct S { int i; float f; };
>>>>> S b();
>>>>>
>>>>> void f(long long *ptr1, S *ptr2) {
>>>>>   *ptr1 = a();
>>>>>   *ptr2 = b();
>>>>> }
>>>>>
>>>>> It compiles to:
>>>>>         callq   _Z1av
>>>>>         movq    %rax, (%rbx)
>>>>>         callq   _Z1bv
>>>>>         movq    %rax, (%r14)
>>>>>
>>>>> Both need to be in the single register.
>>>>>
>>>>> But if you consider the following IR:
>>>>> declare i64 @a()
>>>>> declare {i32, float} @b()
>>>>>
>>>>> define void @f(i64* %ptr1, {i32, float}* %ptr2) {
>>>>> entry:
>>>>>   %calla = call i64 @a()
>>>>>   store i64 %calla, i64* %ptr1
>>>>>   %callb = call {i32, float} @b()
>>>>>   store {i32, float} %callb, {i32, float}* %ptr2
>>>>>   ret void
>>>>> }
>>>>>
>>>>> The LLVM lowering of the call to @b isn't what you want:
>>>>>         callq   a
>>>>>         movq    %rax, (%r14)
>>>>>         callq   b
>>>>>         movl    %eax, (%rbx)
>>>>>         movss   %xmm0, 4(%rbx)
>>>>>
>>>>> In order to get the ABI's semantics, we have to map the type into
>>>>> something that will actually get lowered in that exact manner.
>>>>
>>>>
>>>> Still, the FE does not seem to be the right place to fix the bug
>>>> (considering multiple different FEs targeting LLVM). It seems to me this bug
>>>> is either due to the call return lowering either happens too early or needs
>>>> to be target dependent.
>>>
>>>
>>> I'm confused... What do you mean by "fix the bug"?
>>>
>>> The performance bug? I don't think anyone has suggested the frontend --
>>> the advice was to fix the performance issue *later* in the pipeline.
>>>
>>> And I don't see a bug with the ABI lowering though, so I don't understand
>>> what you would mean by "fix the bug" in that context...
>>
>>
>> As your example shows, if FE does not do target dependent IR generation,
>> the LLVM lowered code would be wrong -- that is a bug in middle end IMO.
>
>
> LLVM fundamentally does not provide a portable IR. This is one aspect, but
> there are *many* others. This has been discussed on the lists several times,
> and this patch review may not be the best place to re-hash it, but it's
> essentially a non-goal for the IR.
>
> So we're always going to have target dependent IR at least in some respects
> unless the above fundamental philosophy and direction radically shift.
>
> Given that, the only practical problem is that the relationship between how
> i64 and {i32, float} get emitted is somewhat magical, and there are some
> patterns that are even more magical. Ideas to solve this range from
> surfacing the lowering in some way that makes it less surprising to building
> utility libraries to try and abstract away this aspect, and maybe some
> combination. But none of those efforts have made it very far. A big problem
> with abstracting the mechanisms here is that the decision of whether to pass
> a struct in a single integer register or in some other form is dependent on
> pretty language-centric features -- what kinds of constructors or
> destructors a type has for example. It's not necessarily a reasonable thing
> to abstract away.
>
> Anyways, probably all of this is best discussed independently of this
> particular patch....
>
> -Chandler