[llvm-bugs] [Bug 42305] New: [X86][AVX] Regression in subvector scatter due to splitting ymm loads
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Jun 18 06:30:58 PDT 2019
https://bugs.llvm.org/show_bug.cgi?id=42305
Bug ID: 42305
Summary: [X86][AVX] Regression in subvector scatter due to
splitting ymm loads
Product: libraries
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: llvm-dev at redking.me.uk
CC: andrea.dibiagio at gmail.com, craig.topper at gmail.com,
greg.bedwell at sony.com, llvm-bugs at lists.llvm.org,
llvm-dev at redking.me.uk, spatel+llvm at rotateright.com
https://godbolt.org/z/8Cjr7X
#include <x86intrin.h>
void scatter_subvectors( int count, const float* src, const int* idx, float*
dst )
{
for( int i = 0; i != count; ++i )
{
__m256 j01 = _mm256_loadu_ps((const float*)(src + 0));
__m256 j23 = _mm256_loadu_ps((const float*)(src + 8));
src += 16;
_mm_storeu_ps(dst + idx[0], _mm256_extractf128_ps(j01, 0));
_mm_storeu_ps(dst + idx[1], _mm256_extractf128_ps(j01, 1));
_mm_storeu_ps(dst + idx[2], _mm256_extractf128_ps(j23, 0));
_mm_storeu_ps(dst + idx[3], _mm256_extractf128_ps(j23, 1));
idx += 4;
}
}
We're seeing a regression as the ymm loads are each being split into 2 xmm
loads because we extract+store the subvectors independently. But because
vextractf128 can store directly we're increasing instruction count completely
unnecessarily.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190618/7d523630/attachment-0001.html>
More information about the llvm-bugs
mailing list