[llvm-bugs] [Bug 40265] New: autovectorization of repeated calls to vectorizable functions fails

Wed Jan 9 00:57:54 PST 2019

https://bugs.llvm.org/show_bug.cgi?id=40265

            Bug ID: 40265
           Summary: autovectorization of repeated calls to vectorizable
                    functions fails
           Product: clang
           Version: 6.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: C++
          Assignee: unassignedclangbugs at nondot.org
          Reporter: kfjahnke at gmail.com
                CC: blitzrakete at gmail.com, dgregor at apple.com,
                    erik.pilkington at gmail.com, llvm-bugs at lists.llvm.org,
                    richard-llvm at metafoo.co.uk

After consulting the documentation at

https://llvm.org/docs/Vectorizers.html

I tried to trigger autovectorization of loops with vectorizable functions. The
documentation gives a list of functions which are meant to be vectorized in the
section headed 'Vectorization of function calls', namely

pow     exp     exp2
sin     cos     sqrt
log     log2    log10
fabs    floor   ceil
fma     trunc   nearbyint
                fmuladd

I found that most of the listed functions are not autovectorized. Since some of
the functions (e.g. floor, ceil, trunc) are autovectorized, I was able to patch
the resulting assembler code, replacing the vector op-codes (vroundps to
vsqrtps, also adapting the argument pattern), and found that the resulting
binary was significantly faster and worked as intended (I exemplarily did this
for 'sqrt' on my AVX2 system and got about 400% speedup). So my guess is that
the autovectorization opportunity is simply missed - the code structure to
produce assembler code for the given loop pattern is obviously there and
functioning. The compiler does indeed state it is unable to vectorize. I was
using this test code:

#include <cmath>

extern float data [ 32768 ] ;

extern void vf1()
{
  #pragma vectorize enable 
  for ( int i = 0 ; i < 32768 ; i++ )
    data [ i ] = std::sqrt ( data [ i ] ) ;
}

and this compiler call:

clang++ -fvectorize  -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize
-std=c++11 -O3 -mavx2 -S -o sqrt.s sqrt_gcc.cc

resulting in these diagnosic messages:

/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:464:12:
remark: loop not vectorized: call instruction cannot be vectorized
[-Rpass-analysis=loop-vectorize]
  { return __builtin_sqrtf(__x); }
           ^
sqrt_gcc.cc:14:3: remark: loop not vectorized: read with atomic ordering or
volatile read [-Rpass-analysis=loop-vectorize]
  for ( int i = 0 ; i < 32768 ; i++ )

using e.g. 'trunc' instead of 'sqrt' vectorizes correctly.

I did find an old thread here complaining about this behaviour:

http://clang-developers.42468.n3.nabble.com/Bug-with-vectorization-of-transcendental-functions-tc4041229.html#a4041291

but it seems that there was no conclusion, so I am submitting this bug report,
hoping to revive the topic.

With regards
Kay F. Jahnke

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190109/69fe6330/attachment.html>