<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Hi Tim and James,</div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">
After such a long discussion, I'm really hoping I can understand and agree with you, but I still can't be convinced by your solution.<br></div></div><div class="gmail_extra"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">
<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">1) For this issue,</div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div>> In C, we say (a++ == &a[1]) is true,<br>
> this should be guaranteed for big-endian as well.<br>
<br>
</div>Definitely, but we can still use ldr/str and make this work. It<br>
involves remapping all lane-referencing instructions during codegen.<br>
To make the example more concrete, under the ldr/str scheme (assuming<br>
alignment can be preserved), a function like:<br>
<br>
int16_t foo() { return a[0]; }<br>
<br>
Might produce (for some odd reasons, but it's valid IR: we guarantee<br>
element 0 has the lowest address even in big-endian):<br>
<br>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small;display:inline"></div> define i16 @foo() {<br>
%val = load <4 x i16>* bitcast([4 x i16]* @a to <4 x i16>*)<br>
%elt = extractelement <4 x i16> %val, i32 0<br>
ret i16 %elt<br>
}<br>
<br>
This could then be assembled (assuming the alignment requirements are met) to:<br>
foo:<br>
adrp, x0, a<br>
ldr d0, [x0, :lo12:a]<br>
umov w0, v0.h[3]<br>
<br>
Note the "[3]", rather than "[0]".<br><br></blockquote><div><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Can you explain how to get address &a[0] for big-endian? It's a C expression, and I think it should be simply </div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Suppose the data in a[4] is,</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">0x00aabb00: 01</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">0x00aabb02: 02<br></div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">0x00aabb04: 03<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">0x00aabb06: 04<br>
</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">I think &a[0] is 0x00aabb00. If you agree, then </div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">int16_t *p = &a[0]; p += 3; return *p;</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">What the return value is for big-endian? I think the value should be 04 rather than 01. Following you assembly code you gave, the result would be 01. I don't think it make sense, and isn't a disaster for programmer.</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">2) For the case you gave,</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><div class="gmail_quote" style="font-family:arial"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;display:inline"></div><br>define i16 @foo(i1 %tst, <4 x i16>* %arr_ptr, <4 x i16>* %vec_ptr) {<br> br i1 %tst, label %load_vec, label %load_arr<br>
<br>load_vec:<br> %vec = load <4 x i16>* %vec_ptr, align 8<br> br label %end<br><br>load_arr:<br> %arr = load <4 x i16>* %arr_ptr, align 2<br> br label %end<br><br>end:<br> %val = phi <4 x i16> [%vec, %load_vec], [%arr, %load_arr]<br>
<div> %elt = extractelement <4 x i16> %val, i32 0<br> ret i16 %elt<br>}<br><br></div>Cheers.<br><br>Tim.<br><br>P.S. I argue (modulo bugs):<br><br>foo:<br> cbz w0, .Lload_arr<br>.Lload_vec:<br> ldr d0, [x2]<br>
b .Lend<br>.Lload_arr:<br> ld1 { v0.4h }, [x1]</blockquote><div><br></div><div>I'm confused here! I think my solution is to ld1 in big-endian for align 2, and you has been arguing not but use ldr.</div><div><br>
</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> rev16 v0.8b, v0.8b<br></blockquote><div><br></div>
<div>And with my solution, you needn't rev16 at all. If programmer is writing code like this or compiler is generating code like this, programmer or compiler should have known arr_ptr and vec_ptr have different data layout for big-endian!</div>
<div><br></div><div>It's quite strage to add rev16 here! How and when do you decide to add rev16 in compiler? Isn't it a disaster for compiler implementation?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
.Lend:<br> umov w0, v0.h[3]<br> ret<br><br>With two possible viable alternatives:<br>1. Drop rev16. Only ever use ld1. umov can use lane [0].<br>2. Attach rev16 to ldr instead of ld1. umov can use lane [0].<br></blockquote>
</div></div><div class="gmail_default"><div style="font-family:arial,helvetica,sans-serif;font-size:small"> </div><div style="font-family:arial,helvetica,sans-serif;font-size:small">3) For the following statement you gave,</div>
<div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><blockquote class="gmail_quote" style="font-family:arial;font-size:small;margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;display:inline"></div>I think the fundamental issue is over in-register representations of<br>vectors. I think the only way to make the backend work sanely (and<br>
possibly at all) is to demand just one representation everywhere:<br></blockquote><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">
I think this is the point I totally agree with you. However, I don't this is the only one we should guarantee. Instead we should also guarantee the semantic correctness in C level programming.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">
<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">int16_t a[4];</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">int16x4_t b;</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Global variables a and b are different. a requires a fixed element layout, but b has different element layout for big-endian and little-endian, although both a and b have different data layout within element.</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Programmers or tools can't change this, otherwise it would be software ecosystem disaster.</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">With your solution, it would introduce a lot of 'strange' code, if use ldr/str for variable a in big-endian mode. It's unacceptable for me.</div>
<div style="font-family:arial,helvetica,sans-serif;font-size:small"> </div><blockquote class="gmail_quote" style="font-family:arial;font-size:small;margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
whether this is the ldr/str representation or the ld1/st1<br>representation is less important to me, </blockquote><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">
I think it does matter because ldr/str and ld1/st1 have different data layout semantics. </div><div style="font-family:arial,helvetica,sans-serif;font-size:small"> </div><blockquote class="gmail_quote" style="font-family:arial;font-size:small;margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
but we mustn't mix them for<br>the sanity of everyone concerned.</blockquote><div style="font-family:arial;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">
Yes, we should avoid mixing them. The issue is how to guarantee a stable interface crossing different modules.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Only using ldr/str for big-endian would introduce a lot of strange code in big-endian binary. Given that we have ld1/st1, why do we need those strange code?</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">4) Can you explain when ld1/st1 will be used for big-endian mode with your solution? What is the algorithm of generating ld1/st1 for compiler for big-endian? Or you are proposing not using ld1/st1 forever?</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">For strict mode, on LLVM IR if we see 'align 2', you are arguing to use ldr/str for big-endian, then would it raise exception potentially? If you say for this case, the solution would be to use ld1/st1, then is it just to check alignment to decide which instruction we should use?</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Anyway, I don't think the solution of doing things like changing [0] to [3] and inserting rev instruction by using some 'smart algorithm' in compiler is a reasonable choice.</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">5) James says,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">
<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><blockquote class="gmail_quote" style="font-size:small;font-family:arial;margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;display:inline"></div>To weigh in my two-pennyworth, I agree with Tim. Alignment is a hint that could be changed or removed at any time, and it shouldn't change semantics in any other way than that increasing it could cause an alignment fault.<br>
</blockquote><div style="font-size:small;font-family:arial"><br></div><div style="font-family:arial"><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif">OK. If you don't think using 'alignment' to distinguish them is not a good implementation in LLVM IR, then Albrecht's proposal of using a new data type would be a choice, although I think it would introduce huge code change, and it would not be realistic.</div>
<div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif">So far I don't really see a example which can break the solution of using 'alignment' to determine the instruction choice between ldr/str and ld1/st1. Your solution is doing this,</div>
<div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif">a) introducing bitcasting because of optimizations</div>
<div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif">b) introducing element index change because of explicitly exposure to compiler</div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif">
c) introducing rev because of mixing the different layout</div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif">
All those don't sound a simple and nice solution to me.</div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif">
If bitcasting is introduced by programmer, I think programmers should guarantee the semantic correctness by themselves. If bitcasting is introduced by compiler, then optimizer should guarantee the semantic correctness.</div>
<div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><div class="gmail_default" style="display:inline">
</div><span style="font-family:arial"> define i16 @foo() {</span><br style="font-family:arial"><span style="font-family:arial"> %val = load <4 x i16>* bitcast([4 x i16]* @a to <4 x i16>*)</span><br style="font-family:arial">
<span style="font-family:arial"> %elt = extractelement <4 x i16> %val, i32 0</span><br style="font-family:arial"><span style="font-family:arial"> ret i16 %elt</span><br style="font-family:arial"><span style="font-family:arial"> }</span><br style="font-family:arial">
</div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><span style="font-family:arial"><br></span></div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif">
<span style="font-family:arial">For this case, I assume optimizer wants to generate it. Then I would say optimization is generating invalid code for big-endian. Can you tell the difference between [4 x i16] and <4 x i16> in LLVM IR? LLVM IR doc says,</span></div>
<div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><span style="font-family:arial"><br></span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span style="line-height:21.600000381469727px;font-family:'Lucida Grande','Lucida Sans Unicode',Geneva,Verdana,sans-serif">* The array type is a very simple derived type that arranges elements sequentially in memory.</span><span style="font-family:arial"><br>
</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span style="line-height:21.600000381469727px;font-family:'Lucida Grande','Lucida Sans Unicode',Geneva,Verdana,sans-serif">* A vector type is a simple derived type that represents a vector of elements. </span><span style="line-height:21.600000381469727px;font-size:14.399999618530273px;font-family:'Lucida Grande','Lucida Sans Unicode',Geneva,Verdana,sans-serif"><br>
</span></div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><span style="font-family:arial">If you say, (</span><span style="font-family:arial">[4 x i16] *) and (<4 x i16> *) are only pointers and we can always type casting them, then I would say the bitcasting is trying to change the semantics of interpreting the data stored by the pointer, because following natural alignment rule, [4 x i16] and <4 x i16> has different alignment. Therefore, I don't think this is a correct bitcasting generated by optimizer for big-endian.</span></div>
<div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><span style="font-family:arial"><br></span></div><div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif">
<span style="font-family:arial">I understand auto-vectorizer want to utilize this kind of casting to generate code using vector type. To achieve this goal, I think for big-endian, we should not only introduce bitcast, but "rev" in LLVM IR. Otherwise, the transformation doesn't really keep the original semantic in LLVM IR. Finally, probably the result would be same as you described, some "rev"s would be introduced in binary, but the solution is totally different.</span></div>
<div class="gmail_default" style="font-size:small;font-family:arial,helvetica,sans-serif"><br></div></div></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">
-Jiangning</div></div></div>
</div></div>