[PATCH] AARCH64_BE load/store rules fix for ARM ABI

Fri Mar 14 03:41:22 PDT 2014

Hi Tim,

>     #include <arm_neon.h>
>     extern int32x2_t var;
>
    void foo(int32x2_t in) {
>        var = in;
>     }
>
> and this with LLVM (or mentally, if Clang doesn't do what you want):
>
>     #include <stdio.h>
>     #include <arm_neon.h>
>
>     int32x2_t var;
>     extern void foo(int32x2_t);
>
>     int main() {
>       var = vset_lane_s32(1, var, 0);
>       var = vset_lane_s32(2, var, 1);
>       foo(var);
>       printf("%d %d\n", vget_lane_s32(var, 0), vget_lane_s32(var, 1));
>     }
>
> I think we can both agree that this should print "1 2", but I think
> you'll find extra REVs are needed for compatibility if you decide
> Clang must use ld1/st1.
>
> 
int32x2_t is a type defined in arm_neon.h, so I think we should use
ldr/str, I didn't mean we should ld1/st1 for it.

Instead, my example is int16_t a[4], which is different from int16x4_t. For
this type, I meant to use ld1/st1.

>  > I would say for this case, to support strict mode, we should have to use
> > ld1, although the address might have been expanded to 8-byte aligned,
> > because "align 2" implies the data is from an array of elements,
>
> From LLVM's perspective the "align 2" is an optimisation hint, with no
> effect on semantics. It *will* be changed by optimisers, if
> 
> they spot
> that a larger alignment can be guaranteed by whatever means they
> choose.
>

Yes, it is a hint, but it affect sementic of data layout, so we should
not change it in compiler at will.

> We cannot change that (and I wouldn't want to even if I could, it
> would introduce a shadow type system and be horrible) and have to make
> codegen work regardless.
>
> >> I don't believe so. Clang will generate array accesses as scalar
> >> operations. The vectorizer may transform them, but only into
> >> well-specifier LLVM IR. How we implement that is our choice, and we
> >> can use either ld1 or ldr. Do you have a counter-example?
> >>
> > Yes, we can apply optimization, but we should change the semantic
> interface
> > crossing functions. My example is in a .h file, if we define,
> >
> > extern int16_t a[4];
> >
> > In function f1 defined in file file1, and function f2 in file file2, we
> > should guarantee to use ld1/st1 to load/store variable a.
>
> We should make no such guarantees. We should be free to use whatever
> instructions we like as long as the semantics of C and LLVM IR are
> preserved.
>
> >  In C, we say (a++ == &a[1]) is true,
> > this should be guaranteed for big-endian as well.
>
> Definitely, but we can still use ldr/str and make this work. It
> involves remapping all lane-referencing instructions during codegen.
> To make the example more concrete, under the ldr/str scheme (assuming
> alignment can be preserved), a function like:
>
>     int16_t foo() { return a[0]; }
>
> Might produce (for some odd reasons, but it's valid IR: we guarantee
> element 0 has the lowest address even in big-endian):
>
> 
>     define i16 @foo() {
>
> 
> %val = load <4 x i16>* bitcast([4 x i16]* @a to <4 x i16>*)
>       %elt = extractelement <4 x i16> %val, i32 0
>       ret i16 %elt
>     }
>
> This could then be assembled (assuming the alignment requirements are met)
> to:
>     foo:
>         adrp, x0, a
>         ldr d0, [x0, :lo12:a]
>         umov w0, v0.h[3]
>
> Note the "[3]", rather than "[0]".
>
>
For this example, [4 x i16]* implies 2-type alignment, and after bitcasting
to <4 x i16>, the alignment will be changed to 8-byte alignment. Since this
bitcasting implies alignment change, and semantic of data layout is
changing for big-endian, and I would treat it as an incorrect
implementation/transformation.

If we don't have "

%val = load <4 x i16>* bitcast([4 x i16]* @a to <4 x i16>*)", but pass val
from argument, will we still change [0] to [3] for big-endian with your
solution?

    define i16 @foo(<4 x i16> %val) {
      %elt = extractelement <4 x i16> %val, i32 0
      ret i16 %elt
    }

If yes, doesn't look strange?

Finally, I think our disagreement essentially is "Does alignment change
semantic of layout or not?". Your answer is no, but my answer is yes.

Thanks,
-Jiangning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140314/fca2bd3d/attachment.html>