[PATCH] D36058: [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF8 stride 4).

Sun Jul 30 05:22:43 PDT 2017

m_zuckerman created this revision.

This patch expands the support of lowerInterleavedStore to 8x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:

c0, c1, , c7
m0, m1, , m7
y0, y1, , y7
k0, k1, ., k7

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

https://reviews.llvm.org/D36058

Files:
  lib/Target/X86/X86InterleavedAccess.cpp
  test/CodeGen/X86/x86-interleaved-access.ll
  test/Transforms/InterleavedAccess/X86/interleavedStore.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D36058.108828.patch
Type: text/x-patch
Size: 33724 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170730/ca474195/attachment.bin>