Hi all,<br><br>Sorry for arriving late to the party. First, some context:<div><br></div><div>vld1 is not the same as a pointer dereference. The alignment requirements are different (which I saw you hacked around in your testcase using attribute((aligned(4))) ), and in big endian environments they do totally different things (VLD1 does element-wise byteswapping and pointer dereferences byteswaps the entire 128-bit number).</div><div><br></div><div>While pointer dereference does work just as well (and better, given this defect) as VLD1 it is explicitly *not supported*. The ACLE mandates that there are only certain ways to legitimately "create" a vector object - vcreate, vcombine, vreinterpret and vload. NEON intrinsic types don't exist in memory (memory is modelled as a sequence of scalars, as in the C model). For this reason Renato I don't think we should advise people to work around the API, as who knows what problems that will cause later.</div><div><br></div><div>The reason above is why we map a vloadq_f32() into a NEON intrinsic instead of a generic IR load. Looking at your testcase, even with tip-of-trunk clang we generate redundant loads and stores:</div><div><br></div><div><div><span class="Apple-tab-span" style="white-space:pre"> </span>vld1.32<span class="Apple-tab-span" style="white-space:pre"> </span>{d16, d17}, [r1]</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>vld1.32<span class="Apple-tab-span" style="white-space:pre"> </span>{d18, d19}, [r0]</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>mov<span class="Apple-tab-span" style="white-space:pre"> </span>r0, sp</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>vmul.f32<span class="Apple-tab-span" style="white-space:pre"> </span>q8, q9, q8</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>vst1.32<span class="Apple-tab-span" style="white-space:pre"> </span>{d16, d17}, [r0]</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>vld1.64<span class="Apple-tab-span" style="white-space:pre"> </span>{d16, d17}, [r0:128]</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>vst1.32<span class="Apple-tab-span" style="white-space:pre"> </span>{d16, d17}, [r2]</div></div><div><br></div><div>Whereas for AArch64, we don't (and neither do we for the chained multiply case):</div><div><br></div><div><div><span class="Apple-tab-span" style="white-space:pre"> </span>ldr<span class="Apple-tab-span" style="white-space:pre"> </span> q0, [x0]</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>ldr<span class="Apple-tab-span" style="white-space:pre"> </span> q1, [x1]</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>fmul<span class="Apple-tab-span" style="white-space:pre"> </span>v0.4s, v0.4s, v1.4s</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>str<span class="Apple-tab-span" style="white-space:pre"> </span> q0, [x2]</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>ret</div></div><div><br></div><div>So this is handled, and I think there's something wrong/missing in the optimizer for AArch32. This is a legitimate bug and should be fixed (even if a workaround is required in the interim!)</div><div><br></div><div>Cheers,</div><div><br></div><div>James</div><br><div class="gmail_quote">On Mon Jan 05 2015 at 10:46:10 AM Renato Golin <<a href="mailto:renato.golin@linaro.org">renato.golin@linaro.org</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 5 January 2015 at 10:14, Simon Taylor <<a href="mailto:simontaylor1@ntlworld.com" target="_blank">simontaylor1@ntlworld.com</a>> wrote:<br>
> I don’t recall seeing anything about pointer dereferencing, but it may have the same issues. I’m a bit hazy on endianness issues with NEON anyway (in terms of element numbering, casts between types, etc) but it seems like all the smartphone platform ABIs are defined to be little-endian so I haven’t spent too much time worrying about it.<br>
<br>
Tim is right, this can be a potential danger, but not more than other<br>
endian or type size issues. If you're writing portable code, I assume<br>
you'll already be mindful of those issues.<br>
<br>
This is why I said it's still a problem, but not a critical one. Maybe<br>
adding a comment to your code explaining the issue will help you in<br>
the future to move it back to NEON loads/stores once this is fixed.<br>
<br>
cheers,<br>
--renato<br>
<br>
______________________________<u></u>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvmdev</a><br>
</blockquote></div>