[PATCH] D34601: [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.

Sun Jun 25 00:38:04 PDT 2017

m_zuckerman created this revision.

This patch expands the support of lowerInterleavedStore to 32x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low).

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 32 chars:

c0, c1, , c31
m0, m1, , m31
y0, y1, , y31
k0, k1, ., k31

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

https://reviews.llvm.org/D34601

Files:
  lib/Target/X86/X86InterleavedAccess.cpp
  test/CodeGen/X86/x86-interleaved-access.ll
  test/Transforms/InterleavedAccess/X86/interleavedStore.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D34601.103862.patch
Type: text/x-patch
Size: 24433 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170625/5c89c679/attachment.bin>