[PATCH] D22840: [InstCombine] Split int64 store into separate int32 stores

Wed Jul 27 00:48:25 PDT 2016

On Wed, Jul 27, 2016 at 12:39 AM Xinliang David Li <davidxl at google.com>
wrote:

> On Wed, Jul 27, 2016 at 12:24 AM, Chandler Carruth <chandlerc at google.com>
> wrote:
>
>> On Wed, Jul 27, 2016 at 12:15 AM Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>> On Tue, Jul 26, 2016 at 11:27 PM, Chandler Carruth <chandlerc at google.com
>>> > wrote:
>>>
>>>> On Tue, Jul 26, 2016 at 11:19 PM Xinliang David Li <davidxl at google.com>
>>>> wrote:
>>>>
>>>>> On Tue, Jul 26, 2016 at 11:11 PM, Chandler Carruth <
>>>>> chandlerc at google.com> wrote:
>>>>>
>>>>>> On Tue, Jul 26, 2016 at 11:08 PM Xinliang David Li via llvm-commits <
>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>
>>>>>>> Hmm, why does Clang change the return type from from struct to i64?
>>>>>>> Set aside whether it is always beneficial to do it or not, it certainly
>>>>>>> feels like the wrong place to make such a change.
>>>>>>>
>>>>>>
>>>>>> It's necessary to get the right ABI out of LLVM.
>>>>>>
>>>>>>
>>>>> what ABI issue and what can be wrong without using i64? I might have
>>>>> missed something obvious here.
>>>>>
>>>>
>>>> The ABI specifies that a struct such as this is returned in a single
>>>> 64-bit integer register.
>>>>
>>>
>>> yes, at least for system V ABI for x86_64.
>>>
>>>
>>>>
>>>> Consider this C++ test case:
>>>> long long a();
>>>>
>>>> struct S { int i; float f; };
>>>> S b();
>>>>
>>>> void f(long long *ptr1, S *ptr2) {
>>>>   *ptr1 = a();
>>>>   *ptr2 = b();
>>>> }
>>>>
>>>> It compiles to:
>>>>         callq   _Z1av
>>>>         movq    %rax, (%rbx)
>>>>         callq   _Z1bv
>>>>         movq    %rax, (%r14)
>>>>
>>>> Both need to be in the single register.
>>>>
>>>> But if you consider the following IR:
>>>> declare i64 @a()
>>>> declare {i32, float} @b()
>>>>
>>>> define void @f(i64* %ptr1, {i32, float}* %ptr2) {
>>>> entry:
>>>>   %calla = call i64 @a()
>>>>   store i64 %calla, i64* %ptr1
>>>>   %callb = call {i32, float} @b()
>>>>   store {i32, float} %callb, {i32, float}* %ptr2
>>>>   ret void
>>>> }
>>>>
>>>> The LLVM lowering of the call to @b isn't what you want:
>>>>         callq   a
>>>>         movq    %rax, (%r14)
>>>>         callq   b
>>>>         movl    %eax, (%rbx)
>>>>         movss   %xmm0, 4(%rbx)
>>>>
>>>> In order to get the ABI's semantics, we have to map the type into
>>>> something that will actually get lowered in that exact manner.
>>>>
>>>
>>> Still, the FE does not seem to be the right place to fix the bug
>>> (considering multiple different FEs targeting LLVM). It seems to me this
>>> bug is either due to the call return lowering either happens too early or
>>> needs to be target dependent.
>>>
>>
>> I'm confused... What do you mean by "fix the bug"?
>>
>> The performance bug? I don't think anyone has suggested the frontend --
>> the advice was to fix the performance issue *later* in the pipeline.
>>
>> And I don't see a bug with the ABI lowering though, so I don't understand
>> what you would mean by "fix the bug" in that context...
>>
>
> As your example shows, if FE does not do target dependent IR generation,
> the LLVM lowered code would be wrong -- that is a bug in middle end IMO.
>

LLVM fundamentally does not provide a portable IR. This is one aspect, but
there are *many* others. This has been discussed on the lists several
times, and this patch review may not be the best place to re-hash it, but
it's essentially a non-goal for the IR.

So we're always going to have target dependent IR at least in some respects
unless the above fundamental philosophy and direction radically shift.

Given that, the only practical problem is that the relationship between how
i64 and {i32, float} get emitted is somewhat magical, and there are some
patterns that are even more magical. Ideas to solve this range from
surfacing the lowering in some way that makes it less surprising to
building utility libraries to try and abstract away this aspect, and maybe
some combination. But none of those efforts have made it very far. A big
problem with abstracting the mechanisms here is that the decision of
whether to pass a struct in a single integer register or in some other form
is dependent on pretty language-centric features -- what kinds of
constructors or destructors a type has for example. It's not necessarily a
reasonable thing to abstract away.

Anyways, probably all of this is best discussed independently of this
particular patch....

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160727/5189ac24/attachment.html>