<div dir="ltr">> (Right?)<div><br></div><div>Uh no, the register content explicitly does change :( We insert REV instructions (byteswap) on each bitcast. Bitcasts can be merged and elided etc, but conceptually there's a register content change on every bitcast.</div><div><br></div><div>James</div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, 13 Jan 2016 at 18:09 Philip Reames <<a href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
On 01/13/2016 08:01 AM, Hal Finkel via llvm-dev wrote:<br>
> ----- Original Message -----<br>
>> From: "James Molloy" <<a href="mailto:james@jamesmolloy.co.uk" target="_blank">james@jamesmolloy.co.uk</a>><br>
>> To: "Hal Finkel" <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>><br>
>> Cc: "llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>, "Quentin Colombet" <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>><br>
>> Sent: Wednesday, January 13, 2016 9:54:26 AM<br>
>> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection<br>
>><br>
>><br>
>>> I think that teaching the optimizer about big-Endian lane ordering<br>
>>> would have been better.<br>
>><br>
>> It's certainly arguable. Even in hindsight I'm glad we didn't -<br>
>> that's the approach GCC took and they've been fixing subtle bugs in<br>
>> their vectorizer ever since.<br>
>><br>
>><br>
>>> Inserting the REV after every LDR<br>
>><br>
>> We only do this conceptually. In most cases REVs cancel out, and we<br>
>> have the LD1 instruction which is LDR+REV. With enough peepholes<br>
>> there's really no need for code to run slower.<br>
>><br>
>><br>
>>> Given what's been done, should we update the LangRef.<br>
>><br>
>> Potentially, yes. I hadn't realised quite how strongly worded it was<br>
>> with respect to this.<br>
>><br>
> Please do ;)<br>
I'm not sure changing bitcast is the right place. Since the bitcast is<br>
representing the in-register value (which doesn't change), maybe we<br>
should define it as part of the load/store instead? That's essentially<br>
what's going on; we're converting from a canonical register form to a<br>
variety of memory forms. (Right?)<br>
><br>
> -Hal<br>
><br>
>> James<br>
>><br>
>><br>
>> On Wed, 13 Jan 2016 at 14:39 Hal Finkel < <a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a> > wrote:<br>
>><br>
>><br>
>><br>
>><br>
>> [resending so the message is smaller]<br>
>><br>
>><br>
>><br>
>><br>
>><br>
>><br>
>> From: "James Molloy via llvm-dev" < <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a> ><br>
>> To: "Quentin Colombet" < <a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a> ><br>
>> Cc: "llvm-dev" < <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a> ><br>
>> Sent: Wednesday, January 13, 2016 2:35:32 AM<br>
>> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global<br>
>> instruction selection<br>
>><br>
>> Hi Philip,<br>
>><br>
>><br>
>><br>
>><br>
>><br>
>> store <2 x i64> %1, <2 x i64>* %y<br>
>><br>
>> Yes. The memory pattern differs. This is the first diagram on the<br>
>> right at: <a href="http://llvm.org/docs/BigEndianNEON.html#bitconverts" rel="noreferrer" target="_blank">http://llvm.org/docs/BigEndianNEON.html#bitconverts</a> )<br>
>><br>
>><br>
>> I think that teaching the optimizer about big-Endian lane ordering<br>
>> would have been better. Inserting the REV after every LDR sounds<br>
>> very similar to what we do for VSX on little-Endian PowerPC systems<br>
>> (PowerPC may have a slight advantage here in that we don't need to<br>
>> do insertelement / extractelement / shufflevector through memory on<br>
>> systems where little-Endian mode is relevant, see<br>
>> <a href="http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf" rel="noreferrer" target="_blank">http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf</a><br>
>> ).<br>
>><br>
>> Given what's been done, should we update the LangRef. It currently<br>
>> reads, " The ‘ bitcast ‘ instruction converts value to type ty2 . It<br>
>> is always a no-op cast because no bits change with this conversion.<br>
>> The conversion is done as if the value had been stored to memory and<br>
>> read back as type ty2 ." But this is now, at the least, misleading,<br>
>> because this process of storing the value as one type and reading it<br>
>> back in as another does, in fact, change the bits. We need to make<br>
>> clear that this might change the bits (perhaps specifically by<br>
>> calling out this case of vector bitcasts on big-Endian systems?).<br>
>><br>
>><br>
>><br>
>> Also, regarding this, " Most operating systems however do not run<br>
>> with alignment faults enabled, so this is often not an issue." Are<br>
>> you saying that the processor does the correct thing in this case<br>
>> (if alignment faults are not enabled, then it performs a proper<br>
>> unaligned load), or that the operating-system trap handler emulates<br>
>> the unaligned load should one occur?<br>
>><br>
>> Thanks again,<br>
>> Hal<br>
>><br>
>><br>
>> _______________________________________________<br>
>><br>
>><br>
>> LLVM Developers mailing list<br>
>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
>><br>
>><br>
>> --<br>
>> Hal Finkel<br>
>> Assistant Computational Scientist<br>
>> Leadership Computing Facility<br>
>> Argonne National Laboratory<br>
>><br>
<br>
</blockquote></div>