[PATCH] AARCH64_BE load/store rules fix for ARM ABI

Fri Mar 14 04:21:01 PDT 2014

> For this example, [4 x i16]* implies 2-type alignment, and after bitcasting
> to <4 x i16>, the alignment will be changed to 8-byte alignment. Since this
> bitcasting implies alignment change, and semantic of data layout is changing
> for big-endian, and I would treat it as an incorrect
> implementation/transformation.

LLVM IR disagrees. It *is* a valid transformation within the rules of
the LangRef. The example was artificial, and the transformation
wouldn't be profitable, but run this C through Clang:

    float foo(float * __restrict lhs, float * __restrict rhs) {
      lhs[0] += rhs[0];
      lhs[1] += rhs[1];
      lhs[2] += rhs[2];
      lhs[3] += rhs[3];
      return lhs[0];
    }

And the SLP-vectorizer will happily convert the array accesses to
vector operations in just the way I did with my artificial example; it
won't bump the alignment up to 16.

And it has the exact same problem: without tracking the history of all
values (whether they came from ld1 or ldr) we can't know whether that
final UMOV should refer to lane 0 or lane 3. Unless we make absolutely
certain that the in-register layout is the same regardless of where a
vector comes from.

> If we don't have "
> %val = load <4 x i16>* bitcast([4 x i16]* @a to <4 x i16>*)", but pass val
> from argument, will we still change [0] to [3] for big-endian with your
> solution?
>
>     define i16 @foo(<4 x i16> %val) {
>       %elt = extractelement <4 x i16> %val, i32 0
>       ret i16 %elt
>     }
>
> If yes, doesn't look strange?

Yes, and I agree it does look strange. But I believe it's a choice
between good-looking code and efficiency. Initially I thought it was
horrific too, but in my talks with James and Albrecht they convinced
me that it would be better in the end.

> Finally, I think our disagreement essentially is "Does alignment change
> semantic of layout or not?". Your answer is no, but my answer is yes.

I think the fundamental issue is over in-register representations of
vectors. I think the only way to make the backend work sanely (and
possibly at all) is to demand just one representation everywhere:
whether this is the ldr/str representation or the ld1/st1
representation is less important to me, but we mustn't mix them for
the sanity of everyone concerned.

We can still use both sets of instructions, but using the
non-canonical one will involve pre/post conversions to the chosen
representation.

Cheers.

Tim.