<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Hi Tim,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">

</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">>   %induction8 = add <8 x i32> %broadcast.splat7, <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7><br>


<br>

</div><div class="">> If you want to generate "rev16+ st1", another programmer p2 would see FAIL.<br>

<br>

</div>The constant vector in the "add" line is probably not what you were<br>

expecting. Since it's basically saying "put 0 in element 0, 1 in<br>

element 1, ..." its lanes are reversed like everything else; you might<br>

write:<br>

    movz w0, #7<br>

    umov v0.h[0], w0<br>

    [...]<br>

<br></blockquote><div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Well, you are using different in-register layout for bit-endian and little-edian. I don't think this is correct. I think previously we have been arguing we should have consistent in-register value for little-endian and big-endian.</div>

<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">For big-endian, in literal pool, we should store that constant vector in reversed order, then we wouldn't have any issue using ldr.</div>

</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">>> let Requires = [IsBE] in<br>

>> def : Pat<(v4i16 (unaligned_load addr:$Rn)), (REV16 (LD1 addr:$Rn))>;<br>

><br>

> Well, what unaligned_load is here? I'm confused again! Isn't just to check<br>

> not total size alignment as I described in my proposal?<br>

<br>

</div>Pretty much, yes.<br></blockquote><div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">OK. I'm happy you agree with this.</div></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class=""><br>

> Instead we want to simply use the pattern below,<br>

><br>

> let Requires = [IsBE] in<br>

> def : Pat<(v4i16 (unaligned_load addr:$Rn)), (LD1 addr:$Rn)>;<br>

<br>

</div>Yep. In that world we would have to write<br>

<div class=""><br>

let Requires = [IsBE] in<br>

</div>def : Pat<(v4i16 (aligned_load addr:$Rn)), (REV16 (LDR addr:$Rn))>;<br>

<br>

when we wanted the efficiency gains of LDR.<br></blockquote><div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">OK. If you say "ldr+rev16" has better performance than ld1, then I'm OK. But it's not a correctness issue but a performance one.</div>

</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class=""><br></div><div class="">

> Can you also give me a real case in C code, and show me the IR that we can't<br>

> simply use ld1/st1 without rev16?<br>

<br>

</div>In what way? When mixed with ldr? It's basically that branching one I<br>

gave earlier in IR form:<br>

<br>

float32_t foo(int test, float32_t *__restrict arr, float32x4_t *<br>

__restrict vec) {<br>

  float res;<br>

  if (test) {<br>

    arr[0] += arr[0];<br>

    arr[1] += arr[1];<br>

    arr[2] += arr[2];<br>

    arr[3] += arr[3];<br>

    res = arr[0];<br>

  } else {<br>

    *vec += *vec;<br>

    res = *(float *)vec;<br></blockquote><div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">This is an invalid C code. vec is a pointer, the data layout pointed by vec is different for big-endian and little-endian. If you do typecasting to float*, you can only guarantee little-endian work. Programmer must know the data layout change, because they are different data types at all. Programmer should code like,</div>

</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">res = vec[0];</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">

<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">We have known typecasting in C is sometimes unsafe. This is a quite natural common sense, I think.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">

</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class=""><br>

> In the file transferred of scenario 1, the char ordering in disk should be<br>

> like 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ..., 127. (let's ignore the ordering<br>

> difference for inside each number for big-endian and little-endian, because<br>

> we are not discussing about that). In the file transferred of scenario 2,<br>

> the char order in disk should be like 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, ... ,<br>

> 127,..., 121, 120.<br>

<br>

</div>Not according to what Clang generates presently. Clang treats vec[j]<br>

as starting from the lowest addressed element too (and the<br>

initialisation you perform).<br></blockquote><div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">I don't understand why you say clang matters this. The front-end doesn't care the address at all, I think. Clang should only think vec[0] means lane 0 access. That's it.</div>

</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=""><br>

</div>PPC is working on the semantics of IR as it stands and as I'm<br>

describing. That's one of the things you seem to be proposing we<br>

change and will be opposed.<br>

<br></blockquote><div> </div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">If this is true for PPC, then the answer should be yes. But I don't see clang and middle-end are doing bad anything we can't accept for aarch64. I ever tried PPC, I don't see any lane change for big-endian in LLVM IR. So I doubt this is true.</div>

<div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">-Jiangning</div>

<br></div></div>

</div></div>