[llvm] r229311 - [X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2

Tue Feb 17 19:55:49 PST 2015

On Sun, Feb 15, 2015 at 5:19 AM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:

> Author: rksimon
> Date: Sun Feb 15 07:19:52 2015
> New Revision: 229311
>
> URL: http://llvm.org/viewvc/llvm-project?rev=229311&view=rev
> Log:
> [X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2
>
> This patch refactors the existing lowerVectorShuffleAsByteShift function
> to add support for 256-bit vectors on AVX2 targets.
>

Unfortunately, this patch introduces a nasty miscompile. See below.

>
> It also fixes a tablegen issue that prevented the lowering of
> vpslldq/vpsrldq vec256 instructions.
>

And because of this, it's a bit hard to disentangle. It would be better to
keep these separate in the future (and yea, i know, i should have said this
in the review...)

+  int NumElts = VT.getVectorNumElements();
> +  int NumLanes = VT.getSizeInBits() / 128;
> +  int NumLaneElts = NumElts / NumLanes;
> +  int Scale = 16 / NumLaneElts;
> +  MVT ShiftVT = MVT::getVectorVT(MVT::i64, 2 * NumLanes);
> +
> +  // PSLLDQ : (little-endian) left byte shift
> +  // [ zz,  0,  1,  2,  3,  4,  5,  6]
> +  // [ zz, zz, -1, -1,  2,  3,  4, -1]
> +  // [ zz, zz, zz, zz, zz, zz, -1,  1]
> +  // PSRLDQ : (little-endian) right byte shift
> +  // [  5, 6,  7, zz, zz, zz, zz, zz]
> +  // [ -1, 5,  6,  7, zz, zz, zz, zz]
> +  // [  1, 2, -1, -1, -1, -1, zz, zz]
> +  auto MatchByteShift = [&](int Shift) -> SDValue {
> +    bool MatchLeft = true, MatchRight = true;
> +    for (int l = 0; l < NumElts; l += NumLaneElts) {
> +      for (int i = 0; i < Shift; ++i)
> +        MatchLeft &= Zeroable[l + i];
> +      for (int i = NumLaneElts - Shift; i < NumLaneElts; ++i)
> +        MatchRight &= Zeroable[l + i];
>      }
> +    if (!(MatchLeft || MatchRight))
> +      return SDValue();
>

> +    bool MatchV1 = true, MatchV2 = true;
> +    for (int l = 0; l < NumElts; l += NumLaneElts) {
> +      unsigned Pos = MatchLeft ? Shift + l : l;
> +      unsigned Low = MatchLeft ? l : Shift + l;
>

What happens if both MatchLeft and MatchRight are true? You don't assert
that this can't happen, and in fact, it can. We only test one of them here.

> +      unsigned Len = NumLaneElts - Shift;
> +      MatchV1 &= isSequentialOrUndefInRange(Mask, Pos, Len, Low);
> +      MatchV2 &= isSequentialOrUndefInRange(Mask, Pos, Len, Low +
> NumElts);
>      }
> -  }
> +    if (!(MatchV1 || MatchV2))
> +      return SDValue();
> +
> +    int ByteShift = Shift * Scale;
> +    unsigned Op = MatchRight ? X86ISD::VSRLDQ : X86ISD::VSHLDQ;
>

And then here, we use MatchRight. Boom.

Here is a test case:

define internal fastcc <8 x i16> @test(<8 x i16> %s.0.0, <8 x i16> %s.0.1)
noinline nounwind {
entry:
  %s.1.0 = shufflevector <8 x i16> %s.0.0, <8 x i16> %s.0.1, <8 x i32> <i32
15, i32 undef, i32 3, i32 undef, i32 12, i32 12, i32 5, i32 undef>

  ret <8 x i16> %s.1.0
}

This gets miscompiled at top-of-tree.

I tried reverting this patch, and it's a mess. Doesn't revert cleanly. =[
When I manually tried to fix the revert, the above test case got
miscompiled *differently* by shifting every byte out. =[

I'm probably just going to rip this function out until you or I have time
to look at re-adding it and ensuring it doesn't have any holes. I don't see
any other clean way of "reverting to green" here because the patches don't
really back out cleanly.

Sorry for the trouble, but miscompiled shuffles are *really* hard to track
down so I want to get back to a green baseline.

If you want to test stuff yourself, my shuffle fuzz tester is commited to
LLVM. Here is how I run it:

% seq 1 48 | xargs -P 48 -n 1 zsh -c 'repeat 10000 (python2.7
utils/shuffle_fuzz.py -v --fixed-bit-width 128 --fixed-element-type i16
--triple x86_64-unknown-unknown --max-shuffle-height 1 | llc -mcpu=x86-64
-mattr=+sse2 | clang -x assembler - -o fuzz.$0; ./fuzz.$0 && echo
"Success!") >>fuzz.$0.log 2>&1'

You can then 'grep FAIL fuzz.*.log' to find failures. You can reproduce
them by taking the UUID in the FAIL line and passing it as the --seed
option to shuffle_fuzz.py (along with the exact options you pass it above).

You can change "48" in the two places above to the number of cores on your
machine. And you can increase the total number of fuzz tests by growing the
argument to "repeat". The above 480000 runs found the test case I gave in
this email and one other failure that I haven't investigated. I'm going to
back out the code for byte shift lowering now and then start fuzz testing
again.

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150217/43a5faff/attachment.html>