[PATCH] AARCH64_BE load/store rules fix for ARM ABI

Fri Mar 14 05:18:44 PDT 2014

Hi,

To weigh in my two-pennyworth, I agree with Tim. Alignment is a hint that could be changed or removed at any time, and it shouldn't change semantics in any other way than that increasing it could cause an alignment fault.

I think this conversation is iterating upon a decision, finally! :)

Cheers,

James

> -----Original Message-----
> From: Tim Northover [mailto:t.p.northover at gmail.com]
> Sent: 14 March 2014 11:21
> To: Jiangning Liu
> Cc: reviews+d2884+public+b8c9a731d72b8312; James Molloy; Albrecht
> Kadlec; Amara Emerson; llvm-commits
> Subject: Re: [PATCH] AARCH64_BE load/store rules fix for ARM ABI
>
> > For this example, [4 x i16]* implies 2-type alignment, and after bitcasting
> > to <4 x i16>, the alignment will be changed to 8-byte alignment. Since this
> > bitcasting implies alignment change, and semantic of data layout is changing
> > for big-endian, and I would treat it as an incorrect
> > implementation/transformation.
>
> LLVM IR disagrees. It *is* a valid transformation within the rules of
> the LangRef. The example was artificial, and the transformation
> wouldn't be profitable, but run this C through Clang:
>
>     float foo(float * __restrict lhs, float * __restrict rhs) {
>       lhs[0] += rhs[0];
>       lhs[1] += rhs[1];
>       lhs[2] += rhs[2];
>       lhs[3] += rhs[3];
>       return lhs[0];
>     }
>
> And the SLP-vectorizer will happily convert the array accesses to
> vector operations in just the way I did with my artificial example; it
> won't bump the alignment up to 16.
>
> And it has the exact same problem: without tracking the history of all
> values (whether they came from ld1 or ldr) we can't know whether that
> final UMOV should refer to lane 0 or lane 3. Unless we make absolutely
> certain that the in-register layout is the same regardless of where a
> vector comes from.
>
> > If we don't have "
> > %val = load <4 x i16>* bitcast([4 x i16]* @a to <4 x i16>*)", but pass val
> > from argument, will we still change [0] to [3] for big-endian with your
> > solution?
> >
> >     define i16 @foo(<4 x i16> %val) {
> >       %elt = extractelement <4 x i16> %val, i32 0
> >       ret i16 %elt
> >     }
> >
> > If yes, doesn't look strange?
>
> Yes, and I agree it does look strange. But I believe it's a choice
> between good-looking code and efficiency. Initially I thought it was
> horrific too, but in my talks with James and Albrecht they convinced
> me that it would be better in the end.
>
> > Finally, I think our disagreement essentially is "Does alignment change
> > semantic of layout or not?". Your answer is no, but my answer is yes.
>
> I think the fundamental issue is over in-register representations of
> vectors. I think the only way to make the backend work sanely (and
> possibly at all) is to demand just one representation everywhere:
> whether this is the ldr/str representation or the ld1/st1
> representation is less important to me, but we mustn't mix them for
> the sanity of everyone concerned.
>
> We can still use both sets of instructions, but using the
> non-canonical one will involve pre/post conversions to the chosen
> representation.
>
> Cheers.
>
> Tim.

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2548782