[PATCH][AVX512] Fix miscompile for unpack

Tue Sep 2 10:55:10 PDT 2014

Ping?

On Aug 13, 2014, at 11:48 PM, Adam Nemet <anemet at apple.com> wrote:

> Hi Elena,
> 
> r189189 implemented AVX512 unpack by essentially performing a 256-bit unpack
> between the low and the high 256 bits of src1 into the low part of the
> destination and another unpack of the low and high 256 bits of src2 into the
> high part of the destination.
> 
> I don't think that's how unpack works.  AVX512 unpack simply has more 128-bit
> lanes but other than it works the same way as AVX.  So in each 128-bit lane, we're
> always interleaving certain parts of both operands rather different parts of
> one of the operands.
> 
> E.g. for this:
> __v16sf a = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
> __v16sf b = { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 };
> __v16sf c = __builtin_shufflevector(a, b, 0, 8, 1, 9, 4, 12, 5, 13, 16,
>                                             24, 17, 25, 20, 28, 21, 29);
> 
> we generated punpcklps (notice how the elements of a and b are not interleaved
> in the shuffle).  In turn, c was set to this:
> 
>  0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29
> 
> Obviously this should have just returned the mask vector of the shuffle
> vector.
> 
> I mostly reverted this change and made sure the original AVX code worked
> for 512-bit vectors as well.
> 
> Also updated the tests because they matched the logic from the code.
> 
> Please let me know if this looks good.
> 
> Thanks,
> Adam
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AVX512-Fix-miscompile-for-unpack.patch
Type: application/octet-stream
Size: 7925 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140902/db3e2f21/attachment.obj>
-------------- next part --------------