[PATCH] D35829: [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
michael zuckerman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 25 03:24:44 PDT 2017
m_zuckerman created this revision.
This patch expands the support of lowerInterleavedStore to 16x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.
The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:
c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 20331 bytes
Desc: not available
More information about the llvm-commits