<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 18 Feb 2015, at 07:17, Chandler Carruth <<a href="mailto:chandlerc@google.com" class="">chandlerc@google.com</a>> wrote:</div><div class=""><div dir="ltr" class=""><div class="gmail_extra">On Tue, Feb 17, 2015 at 7:55 PM, Chandler Carruth <span dir="ltr" class=""><<a href="mailto:chandlerc@google.com" target="_blank" class="">chandlerc@google.com</a>></span> wrote:<br class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sun, Feb 15, 2015 at 5:19 AM, <span class="il">Simon</span> Pilgrim <span dir="ltr" class=""><<a href="mailto:llvm-dev@redking.me.uk" target="_blank" class="">llvm-dev@redking.me.uk</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Author: <span class="il">rksimon</span><br class="">

Date: Sun Feb 15 07:19:52 2015<br class="">

New Revision: 229311<br class="">

<br class="">

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=229311&view=rev" target="_blank" class="">http://llvm.org/viewvc/llvm-project?rev=229311&view=rev</a><br class="">

Log:<br class="">

[X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2<br class="">

<br class="">

This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets.<br class=""></blockquote><div class=""><br class=""></div></span><div class="">Unfortunately, this patch introduces a nasty miscompile. See below.</div><span class=""><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br class="">

It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions.<br class=""></blockquote><div class=""><br class=""></div></span><div class="">And because of this, it's a bit hard to disentangle. It would be better to keep these separate in the future (and yea, i know, i should have said this in the review...)</div><span class=""><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">+  int NumElts = VT.getVectorNumElements();<br class="">

+  int NumLanes = VT.getSizeInBits() / 128;<br class="">

+  int NumLaneElts = NumElts / NumLanes;<br class="">

+  int Scale = 16 / NumLaneElts;<br class="">

+  MVT ShiftVT = MVT::getVectorVT(MVT::i64, 2 * NumLanes);<br class="">

+<br class="">

+  // PSLLDQ : (little-endian) left byte shift<br class="">

+  // [ zz,  0,  1,  2,  3,  4,  5,  6]<br class="">

+  // [ zz, zz, -1, -1,  2,  3,  4, -1]<br class="">

+  // [ zz, zz, zz, zz, zz, zz, -1,  1]<br class="">

+  // PSRLDQ : (little-endian) right byte shift<br class="">

+  // [  5, 6,  7, zz, zz, zz, zz, zz]<br class="">

+  // [ -1, 5,  6,  7, zz, zz, zz, zz]<br class="">

+  // [  1, 2, -1, -1, -1, -1, zz, zz]<br class="">

+  auto MatchByteShift = [&](int Shift) -> SDValue {<br class="">

+    bool MatchLeft = true, MatchRight = true;<br class="">

+    for (int l = 0; l < NumElts; l += NumLaneElts) {<br class="">

+      for (int i = 0; i < Shift; ++i)<br class="">

+        MatchLeft &= Zeroable[l + i];<br class="">

+      for (int i = NumLaneElts - Shift; i < NumLaneElts; ++i)<br class="">

+        MatchRight &= Zeroable[l + i];<br class="">

     }<br class="">

+    if (!(MatchLeft || MatchRight))<br class="">

+      return SDValue();<br class=""></blockquote><div class=""> </div></span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">+    bool MatchV1 = true, MatchV2 = true;<span class=""><br class="">

+    for (int l = 0; l < NumElts; l += NumLaneElts) {<br class="">

+      unsigned Pos = MatchLeft ? Shift + l : l;<br class="">

+      unsigned Low = MatchLeft ? l : Shift + l;<br class=""></span></blockquote><div class=""><br class=""></div><div class="">What happens if both MatchLeft and MatchRight are true? You don't assert that this can't happen, and in fact, it can. We only test one of them here.</div><span class=""><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

+      unsigned Len = NumLaneElts - Shift;<br class="">

+      MatchV1 &= isSequentialOrUndefInRange(Mask, Pos, Len, Low);<br class="">

+      MatchV2 &= isSequentialOrUndefInRange(Mask, Pos, Len, Low + NumElts);<br class="">

     }<br class="">

-  }<br class="">

+    if (!(MatchV1 || MatchV2))<br class="">

+      return SDValue();<br class="">

+<br class="">

+    int ByteShift = Shift * Scale;<br class="">

+    unsigned Op = MatchRight ? X86ISD::VSRLDQ : X86ISD::VSHLDQ;<br class=""></blockquote><div class=""><br class=""></div></span><div class="">And then here, we use MatchRight. Boom.</div><div class=""><br class=""></div><div class="">Here is a test case:</div><div class=""><br class=""></div><div class=""><div class="">define internal fastcc <8 x i16> @test(<8 x i16> %s.0.0, <8 x i16> %s.0.1) noinline nounwind {</div><div class="">entry:</div><div class="">  %s.1.0 = shufflevector <8 x i16> %s.0.0, <8 x i16> %s.0.1, <8 x i32> <i32 15, i32 undef, i32 3, i32 undef, i32 12, i32 12, i32 5, i32 undef></div><div class=""><br class=""></div><div class="">  ret <8 x i16> %s.1.0</div><div class="">}</div></div><div class=""><br class=""></div><div class="">This gets miscompiled at top-of-tree.</div><div class=""><br class=""></div><div class="">I tried reverting this patch, and it's a mess. Doesn't revert cleanly. =[ When I manually tried to fix the revert, the above test case got miscompiled *differently* by shifting every byte out. =[</div><div class=""><br class=""></div><div class="">I'm probably just going to rip this function out until you or I have time to look at re-adding it and ensuring it doesn't have any holes. I don't see any other clean way of "reverting to green" here because the patches don't really back out cleanly.</div></blockquote></div><br class="">Ripping this out was super ugly -- lots of stuff relies on this now.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">I took a stab at rewriting it in r229642. This passes my test case and has so far survived my fuzz testing. I'm not finding any other miscompiles either, so things are looking really good now.</div></div>

</div></blockquote><br class=""></div><div>Thanks for looking at this Chandler - AFAICT flipping that MatchRight test to a MatchLeft equivalent would have been enough to get this working correctly again. But the additional changes you’ve been able to commit since to merge the shift tests look fantastic, although I’m sorry that extra work was forced on you.</div><div><br class=""></div><div>I’m going to start using the shuffle fuzz test script where I can - I’m curious whether this could be used to identify poor code gen (perf) as well as bad (incorrect) code gen?</div><br class=""></body></html>