[LLVMbugs] [Bug 23116] New: Missed optimisation - horizontal max for vectors is not optimised.
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Fri Apr 3 10:34:43 PDT 2015
https://llvm.org/bugs/show_bug.cgi?id=23116
Bug ID: 23116
Summary: Missed optimisation - horizontal max for vectors is
not optimised.
Product: new-bugs
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: nick at indigorenderer.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
Computing the horizontal max (or min etc..) of a vector is not optimised to
faster code.
For example, the C code:
#include <immintrin.h>
inline float max(float a, float b)
{
return a > b ? a : b;
}
float findMax(__m256 v)
{
return max(max(max(max(max(max(max(v[0], v[1]), v[2]), v[3]), v[4]),
v[5]), v[6]), v[7]);
}
Is compiled to by Clang 3.7/trunk to:
findMax(float __vector(8)): # @findMax(float
__vector(8))
vmovshdup %xmm0, %xmm1 # xmm1 = xmm0[1,1,3,3]
vmaxss %xmm1, %xmm0, %xmm1
vpermilpd $1, %xmm0, %xmm2 # xmm2 = xmm0[1,0]
vmaxss %xmm2, %xmm1, %xmm1
vpermilps $231, %xmm0, %xmm2 # xmm2 = xmm0[3,1,2,3]
vmaxss %xmm2, %xmm1, %xmm1
vextractf128 $1, %ymm0, %xmm0
vmaxss %xmm0, %xmm1, %xmm1
vmovshdup %xmm0, %xmm2 # xmm2 = xmm0[1,1,3,3]
vmaxss %xmm2, %xmm1, %xmm1
vpermilpd $1, %xmm0, %xmm2 # xmm2 = xmm0[1,0]
vmaxss %xmm2, %xmm1, %xmm1
vpermilps $231, %xmm0, %xmm0 # xmm0 = xmm0[3,1,2,3]
vmaxss %xmm0, %xmm1, %xmm0
vzeroupper
retq
Which is basically 7 vmaxss's in serial (slow).
It could be optimised to something like:
float findMax(__m256 v)
{
__m128 a = _mm256_extractf128_ps(v, 0);
__m128 b = _mm256_extractf128_ps(v, 1);
__m128 c = _mm_max_ps(a, b);
return max(max(c[0], c[1]), max(c[2], c[3]));
}
Which compiles to:
findMax(float __vector(8)): # @findMax(float
__vector(8))
vextractf128 $1, %ymm0, %xmm1
vmaxps %xmm1, %xmm0, %xmm0
vmovshdup %xmm0, %xmm1 # xmm1 = xmm0[1,1,3,3]
vmaxss %xmm1, %xmm0, %xmm1
vpermilpd $1, %xmm0, %xmm2 # xmm2 = xmm0[1,0]
vpermilps $231, %xmm0, %xmm0 # xmm0 = xmm0[3,1,2,3]
vmaxss %xmm0, %xmm2, %xmm0
vmaxss %xmm0, %xmm1, %xmm0
vzeroupper
retq
See http://goo.gl/jM3KNz for the code.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150403/0c6bb490/attachment.html>
More information about the llvm-bugs
mailing list