<div dir="ltr">><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px">I think that teaching the optimizer about big-Endian lane ordering would have been better.</span><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px"><br></span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px">It's certainly arguable. Even in hindsight I'm glad we didn't - that's the approach GCC took and they've been fixing subtle bugs in their vectorizer ever since.</span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px"><br></span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px">> </span><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px">Inserting the REV after every LDR</span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px"><br></span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px">We only do this conceptually. In most cases REVs cancel out, and we have the LD1 instruction which is LDR+REV. With enough peepholes there's really no need for code to run slower.</span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px"><br></span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px">> </span><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px">Given what's been done, should we update the LangRef.</span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px"><br></span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px">Potentially, yes. I hadn't realised quite how strongly worded it was with respect to this.</span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px"><br></span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:13.3333px;line-height:20px">James</span></div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, 13 Jan 2016 at 14:39 Hal Finkel <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000">[resending so the message is smaller]</div></div><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000"><br><br><hr><br><br>From: "James Molloy via llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> <br>To: "Quentin Colombet" <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>> <br>Cc: "llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> <br>Sent: Wednesday, January 13, 2016 2:35:32 AM <br>Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection <br><br>Hi Philip, <br><br></div></div><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000"><blockquote>      store <2 x i64> %1, <2 x i64>* %y <br><br>Yes. The memory pattern differs. This is the first diagram on the right at: <a href="http://llvm.org/docs/BigEndianNEON.html#bitconverts" target="_blank">http://llvm.org/docs/BigEndianNEON.html#bitconverts</a> ) </blockquote><br></div></div><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000">I think that teaching the optimizer about big-Endian lane ordering would have been better. Inserting the REV after every LDR sounds very similar to what we do for VSX on little-Endian PowerPC systems (PowerPC may have a slight advantage here in that we don't need to do insertelement / extractelement / shufflevector through memory on systems where little-Endian mode is relevant, see <a href="http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf" target="_blank">http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf</a>). <br><br>Given what's been done, should we update the LangRef. It currently reads, " The ‘ bitcast ‘ instruction converts value to type ty2 . It is always a no-op cast because no bits change with this conversion. The conversion is done as if the value had been stored to memory and read back as type ty2 ." But this is now, at the least, misleading, because this process of storing the value as one type and reading it back in as another does, in fact, change the bits. We need to make clear that this might change the bits (perhaps specifically by calling out this case of vector bitcasts on big-Endian systems?). <br></div></div><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000"><br>Also, regarding this, " Most operating systems however do not run with alignment faults enabled, so this is often not an issue." Are you saying that the processor does the correct thing in this case (if alignment faults are not enabled, then it performs a proper unaligned load), or that the operating-system trap handler emulates the unaligned load should one occur? <br><br>Thanks again, <br>Hal <br></div></div><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000">_______________________________________________ <br><blockquote>LLVM Developers mailing list <br><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a> <br><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a> <br><br></blockquote><br>-- <br>Hal Finkel<br>Assistant Computational Scientist<br>Leadership Computing Facility<br>Argonne National Laboratory<br></div></div></blockquote></div>