[LLVMbugs] [Bug 23116] New: Missed optimisation - horizontal max for vectors is not optimised.

Fri Apr 3 10:34:43 PDT 2015

https://llvm.org/bugs/show_bug.cgi?id=23116

            Bug ID: 23116
           Summary: Missed optimisation - horizontal max for vectors is
                    not optimised.
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: nick at indigorenderer.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Computing the horizontal max (or min etc..) of a vector is not optimised to
faster code.
For example, the C code:

#include <immintrin.h>

inline float max(float a, float b)
{
    return a > b ? a : b;
}

float findMax(__m256 v)
{
     return max(max(max(max(max(max(max(v[0], v[1]), v[2]), v[3]), v[4]),
v[5]), v[6]), v[7]);
}

Is compiled to by Clang 3.7/trunk to:

findMax(float __vector(8)):                        # @findMax(float
__vector(8))
    vmovshdup    %xmm0, %xmm1    # xmm1 = xmm0[1,1,3,3]
    vmaxss    %xmm1, %xmm0, %xmm1
    vpermilpd    $1, %xmm0, %xmm2 # xmm2 = xmm0[1,0]
    vmaxss    %xmm2, %xmm1, %xmm1
    vpermilps    $231, %xmm0, %xmm2 # xmm2 = xmm0[3,1,2,3]
    vmaxss    %xmm2, %xmm1, %xmm1
    vextractf128    $1, %ymm0, %xmm0
    vmaxss    %xmm0, %xmm1, %xmm1
    vmovshdup    %xmm0, %xmm2    # xmm2 = xmm0[1,1,3,3]
    vmaxss    %xmm2, %xmm1, %xmm1
    vpermilpd    $1, %xmm0, %xmm2 # xmm2 = xmm0[1,0]
    vmaxss    %xmm2, %xmm1, %xmm1
    vpermilps    $231, %xmm0, %xmm0 # xmm0 = xmm0[3,1,2,3]
    vmaxss    %xmm0, %xmm1, %xmm0
    vzeroupper
    retq

Which is basically 7 vmaxss's in serial (slow).

It could be optimised to something like:

float findMax(__m256 v)
{
    __m128 a = _mm256_extractf128_ps(v, 0);
    __m128 b = _mm256_extractf128_ps(v, 1);

    __m128 c = _mm_max_ps(a, b);

    return max(max(c[0], c[1]), max(c[2], c[3]));
}

Which compiles to:

findMax(float __vector(8)):                        # @findMax(float
__vector(8))
    vextractf128    $1, %ymm0, %xmm1
    vmaxps    %xmm1, %xmm0, %xmm0
    vmovshdup    %xmm0, %xmm1    # xmm1 = xmm0[1,1,3,3]
    vmaxss    %xmm1, %xmm0, %xmm1
    vpermilpd    $1, %xmm0, %xmm2 # xmm2 = xmm0[1,0]
    vpermilps    $231, %xmm0, %xmm0 # xmm0 = xmm0[3,1,2,3]
    vmaxss    %xmm0, %xmm2, %xmm0
    vmaxss    %xmm0, %xmm1, %xmm0
    vzeroupper
    retq

See http://goo.gl/jM3KNz for the code.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150403/0c6bb490/attachment.html>