[llvm] r229311 - [X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2

Wed Feb 18 12:36:34 PST 2015

> On 18 Feb 2015, at 07:17, Chandler Carruth <chandlerc at google.com> wrote:
> On Tue, Feb 17, 2015 at 7:55 PM, Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> wrote:
> On Sun, Feb 15, 2015 at 5:19 AM, Simon Pilgrim <llvm-dev at redking.me.uk <mailto:llvm-dev at redking.me.uk>> wrote:
> Author: rksimon
> Date: Sun Feb 15 07:19:52 2015
> New Revision: 229311
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=229311&view=rev <http://llvm.org/viewvc/llvm-project?rev=229311&view=rev>
> Log:
> [X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2
> 
> This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets.
> 
> Unfortunately, this patch introduces a nasty miscompile. See below.
>  
> 
> It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions.
> 
> And because of this, it's a bit hard to disentangle. It would be better to keep these separate in the future (and yea, i know, i should have said this in the review...)
> 
> +  int NumElts = VT.getVectorNumElements();
> +  int NumLanes = VT.getSizeInBits() / 128;
> +  int NumLaneElts = NumElts / NumLanes;
> +  int Scale = 16 / NumLaneElts;
> +  MVT ShiftVT = MVT::getVectorVT(MVT::i64, 2 * NumLanes);
> +
> +  // PSLLDQ : (little-endian) left byte shift
> +  // [ zz,  0,  1,  2,  3,  4,  5,  6]
> +  // [ zz, zz, -1, -1,  2,  3,  4, -1]
> +  // [ zz, zz, zz, zz, zz, zz, -1,  1]
> +  // PSRLDQ : (little-endian) right byte shift
> +  // [  5, 6,  7, zz, zz, zz, zz, zz]
> +  // [ -1, 5,  6,  7, zz, zz, zz, zz]
> +  // [  1, 2, -1, -1, -1, -1, zz, zz]
> +  auto MatchByteShift = [&](int Shift) -> SDValue {
> +    bool MatchLeft = true, MatchRight = true;
> +    for (int l = 0; l < NumElts; l += NumLaneElts) {
> +      for (int i = 0; i < Shift; ++i)
> +        MatchLeft &= Zeroable[l + i];
> +      for (int i = NumLaneElts - Shift; i < NumLaneElts; ++i)
> +        MatchRight &= Zeroable[l + i];
>      }
> +    if (!(MatchLeft || MatchRight))
> +      return SDValue();
>  
> +    bool MatchV1 = true, MatchV2 = true;
> +    for (int l = 0; l < NumElts; l += NumLaneElts) {
> +      unsigned Pos = MatchLeft ? Shift + l : l;
> +      unsigned Low = MatchLeft ? l : Shift + l;
> 
> What happens if both MatchLeft and MatchRight are true? You don't assert that this can't happen, and in fact, it can. We only test one of them here.
>  
> +      unsigned Len = NumLaneElts - Shift;
> +      MatchV1 &= isSequentialOrUndefInRange(Mask, Pos, Len, Low);
> +      MatchV2 &= isSequentialOrUndefInRange(Mask, Pos, Len, Low + NumElts);
>      }
> -  }
> +    if (!(MatchV1 || MatchV2))
> +      return SDValue();
> +
> +    int ByteShift = Shift * Scale;
> +    unsigned Op = MatchRight ? X86ISD::VSRLDQ : X86ISD::VSHLDQ;
> 
> And then here, we use MatchRight. Boom.
> 
> Here is a test case:
> 
> define internal fastcc <8 x i16> @test(<8 x i16> %s.0.0, <8 x i16> %s.0.1) noinline nounwind {
> entry:
>   %s.1.0 = shufflevector <8 x i16> %s.0.0, <8 x i16> %s.0.1, <8 x i32> <i32 15, i32 undef, i32 3, i32 undef, i32 12, i32 12, i32 5, i32 undef>
> 
>   ret <8 x i16> %s.1.0
> }
> 
> This gets miscompiled at top-of-tree.
> 
> I tried reverting this patch, and it's a mess. Doesn't revert cleanly. =[ When I manually tried to fix the revert, the above test case got miscompiled *differently* by shifting every byte out. =[
> 
> I'm probably just going to rip this function out until you or I have time to look at re-adding it and ensuring it doesn't have any holes. I don't see any other clean way of "reverting to green" here because the patches don't really back out cleanly.
> 
> Ripping this out was super ugly -- lots of stuff relies on this now.
> 
> I took a stab at rewriting it in r229642. This passes my test case and has so far survived my fuzz testing. I'm not finding any other miscompiles either, so things are looking really good now.

Thanks for looking at this Chandler - AFAICT flipping that MatchRight test to a MatchLeft equivalent would have been enough to get this working correctly again. But the additional changes you’ve been able to commit since to merge the shift tests look fantastic, although I’m sorry that extra work was forced on you.

I’m going to start using the shuffle fuzz test script where I can - I’m curious whether this could be used to identify poor code gen (perf) as well as bad (incorrect) code gen?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150218/47398c7e/attachment.html>