[PATCH] AARCH64_BE load/store rules fix for ARM ABI

Tue Mar 18 03:21:50 PDT 2014

Hi Tim,

> >   %induction8 = add <8 x i32> %broadcast.splat7, <i32 0, i32 1, i32 2,
> i32 3, i32 4, i32 5, i32 6, i32 7>
>
> > If you want to generate "rev16+ st1", another programmer p2 would see
> FAIL.
>
> The constant vector in the "add" line is probably not what you were
> expecting. Since it's basically saying "put 0 in element 0, 1 in
> element 1, ..." its lanes are reversed like everything else; you might
> write:
>     movz w0, #7
>     umov v0.h[0], w0
>     [...]
>
>
Well, you are using different in-register layout for bit-endian and
little-edian. I don't think this is correct. I think previously we have
been arguing we should have consistent in-register value for little-endian
and big-endian.

For big-endian, in literal pool, we should store that constant vector in
reversed order, then we wouldn't have any issue using ldr.

> >> let Requires = [IsBE] in
> >> def : Pat<(v4i16 (unaligned_load addr:$Rn)), (REV16 (LD1 addr:$Rn))>;
> >
> > Well, what unaligned_load is here? I'm confused again! Isn't just to
> check
> > not total size alignment as I described in my proposal?
>
> Pretty much, yes.
>

OK. I'm happy you agree with this.

>
> > Instead we want to simply use the pattern below,
> >
> > let Requires = [IsBE] in
> > def : Pat<(v4i16 (unaligned_load addr:$Rn)), (LD1 addr:$Rn)>;
>
> Yep. In that world we would have to write
>
> let Requires = [IsBE] in
> def : Pat<(v4i16 (aligned_load addr:$Rn)), (REV16 (LDR addr:$Rn))>;
>
> when we wanted the efficiency gains of LDR.
>

OK. If you say "ldr+rev16" has better performance than ld1, then I'm OK.
But it's not a correctness issue but a performance one.

>
> > Can you also give me a real case in C code, and show me the IR that we
> can't
> > simply use ld1/st1 without rev16?
>
> In what way? When mixed with ldr? It's basically that branching one I
> gave earlier in IR form:
>
> float32_t foo(int test, float32_t *__restrict arr, float32x4_t *
> __restrict vec) {
>   float res;
>   if (test) {
>     arr[0] += arr[0];
>     arr[1] += arr[1];
>     arr[2] += arr[2];
>     arr[3] += arr[3];
>     res = arr[0];
>   } else {
>     *vec += *vec;
>     res = *(float *)vec;
>

This is an invalid C code. vec is a pointer, the data layout pointed by
vec is different for big-endian and little-endian. If you do typecasting to
float*, you can only guarantee little-endian work. Programmer must know the
data layout change, because they are different data types at all.
Programmer should code like,

res = vec[0];

We have known typecasting in C is sometimes unsafe. This is a quite natural
common sense, I think.

>
> > In the file transferred of scenario 1, the char ordering in disk should
> be
> > like 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ..., 127. (let's ignore the ordering
> > difference for inside each number for big-endian and little-endian,
> because
> > we are not discussing about that). In the file transferred of scenario 2,
> > the char order in disk should be like 7, 6, 5, 4, 3, 2, 1, 0, 15, 14,
> ... ,
> > 127,..., 121, 120.
>
> Not according to what Clang generates presently. Clang treats vec[j]
> as starting from the lowest addressed element too (and the
> initialisation you perform).
>

I don't understand why you say clang matters this. The front-end doesn't
care the address at all, I think. Clang should only think vec[0] means
lane 0 access. That's it.

>
> PPC is working on the semantics of IR as it stands and as I'm
> describing. That's one of the things you seem to be proposing we
> change and will be opposed.
>
>
If this is true for PPC, then the answer should be yes. But I don't see
clang and middle-end are doing bad anything we can't accept for aarch64.
I ever tried PPC, I don't see any lane change for big-endian in LLVM IR. So
I doubt this is true.

Thanks,
-Jiangning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140318/f48fb16d/attachment.html>