<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Hi Tim,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">

</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

    #include <arm_neon.h><br>

    extern int32x2_t var; <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

    void foo(int32x2_t in) {<br>

       var = in;<br>

    }<br>

<br>

and this with LLVM (or mentally, if Clang doesn't do what you want):<br>

<br>

    #include <stdio.h><br>

    #include <arm_neon.h><br>

<br>

    int32x2_t var;<br>

    extern void foo(int32x2_t);<br>

<br>

    int main() {<br>

      var = vset_lane_s32(1, var, 0);<br>

      var = vset_lane_s32(2, var, 1);<br>

      foo(var);<br>

      printf("%d %d\n", vget_lane_s32(var, 0), vget_lane_s32(var, 1));<br>

    }<br>

<br>

I think we can both agree that this should print "1 2", but I think<br>

you'll find extra REVs are needed for compatibility if you decide<br>

Clang must use ld1/st1.<br>

<div><br></div></blockquote><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"></div></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">int32x2_t is a type defined in arm_neon.h, so I think we should use ldr/str, I didn't mean we should ld1/st1 for it.</div>


<div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Instead, my example is int16_t a[4], which is different from int16x4_t. For this type, I meant to use ld1/st1.</div>


</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div>

> I would say for this case, to support strict mode, we should have to use<br>

> ld1, although the address might have been expanded to 8-byte aligned,<br>

> because "align 2" implies the data is from an array of elements,<br>

<br>

</div>From LLVM's perspective the "align 2" is an optimisation hint, with no<br>

effect on semantics. It *will* be changed by optimisers, if<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small;display:inline"></div> they spot<br>

that a larger alignment can be guaranteed by whatever means they<br>

choose.<br></blockquote><div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Yes, it is a hint, but it affect sementic of data layout, so we should not change it in compiler at will.</div>

</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

We cannot change that (and I wouldn't want to even if I could, it<br>

would introduce a shadow type system and be horrible) and have to make<br>

codegen work regardless.<br>

<div><br>

>> I don't believe so. Clang will generate array accesses as scalar<br>

>> operations. The vectorizer may transform them, but only into<br>

>> well-specifier LLVM IR. How we implement that is our choice, and we<br>

>> can use either ld1 or ldr. Do you have a counter-example?<br>

>><br>

> Yes, we can apply optimization, but we should change the semantic interface<br>

> crossing functions. My example is in a .h file, if we define,<br>

><br>

> extern int16_t a[4];<br>

><br>

> In function f1 defined in file file1, and function f2 in file file2, we<br>

> should guarantee to use ld1/st1 to load/store variable a.<br>

<br>

</div>We should make no such guarantees. We should be free to use whatever<br>

instructions we like as long as the semantics of C and LLVM IR are<br>

preserved.<br>

<div><br>

>  In C, we say (a++ == &a[1]) is true,<br>

> this should be guaranteed for big-endian as well.<br>

<br>

</div>Definitely, but we can still use ldr/str and make this work. It<br>

involves remapping all lane-referencing instructions during codegen.<br>

To make the example more concrete, under the ldr/str scheme (assuming<br>

alignment can be preserved), a function like:<br>

<br>

    int16_t foo() { return a[0]; }<br>

<br>

Might produce (for some odd reasons, but it's valid IR: we guarantee<br>

element 0 has the lowest address even in big-endian):<br>

<br>

<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small;display:inline"></div>    define i16 @foo() {<br>

      <div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small;display:inline"></div>%val = load <4 x i16>* bitcast([4 x i16]* @a to <4 x i16>*)<br>

      %elt = extractelement <4 x i16> %val, i32 0<br>

      ret i16 %elt<br>

    }<br>

<br>

This could then be assembled (assuming the alignment requirements are met) to:<br>

    foo:<br>

        adrp, x0, a<br>

        ldr d0, [x0, :lo12:a]<br>

        umov w0, v0.h[3]<br>

<br>

Note the "[3]", rather than "[0]".<br><br></blockquote><div> <br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">For this example, [4 x i16]* implies 2-type alignment, and after bitcasting to <4 x i16>, the alignment will be changed to 8-byte alignment. Since this bitcasting implies alignment change, and semantic of data layout is changing for big-endian, and I would treat it as an incorrect implementation/transformation.</div>

<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">If we don't have "<div class="gmail_default" style="display:inline">

</div><span style="font-family:arial">%val = load <4 x i16>* bitcast([4 x i16]* @a to <4 x i16>*)", but pass val from argument, will we still change [0] to [3] for big-endian with your solution?</span></div>


<div> </div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><div class="gmail_default" style="display:inline"></div><span style="font-family:arial">    define i16 @foo(<4 x i16> %val) {</span></div>

      %elt = extractelement <4 x i16> %val, i32 0<br>      ret i16 %elt<br><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:arial">    }</span></div>

<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">If yes, doesn't look strange?</div>

<br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Finally, I think our disagreement essentially is "Does alignment change semantic of layout or not?". Your answer is no, but my answer is yes.</div>

</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">

-Jiangning</div><br></div></div>

</div></div>