[llvm-bugs] [Bug 40265] New: autovectorization of repeated calls to vectorizable functions fails
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Jan 9 00:57:54 PST 2019
https://bugs.llvm.org/show_bug.cgi?id=40265
Bug ID: 40265
Summary: autovectorization of repeated calls to vectorizable
functions fails
Product: clang
Version: 6.0
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: C++
Assignee: unassignedclangbugs at nondot.org
Reporter: kfjahnke at gmail.com
CC: blitzrakete at gmail.com, dgregor at apple.com,
erik.pilkington at gmail.com, llvm-bugs at lists.llvm.org,
richard-llvm at metafoo.co.uk
After consulting the documentation at
https://llvm.org/docs/Vectorizers.html
I tried to trigger autovectorization of loops with vectorizable functions. The
documentation gives a list of functions which are meant to be vectorized in the
section headed 'Vectorization of function calls', namely
pow exp exp2
sin cos sqrt
log log2 log10
fabs floor ceil
fma trunc nearbyint
fmuladd
I found that most of the listed functions are not autovectorized. Since some of
the functions (e.g. floor, ceil, trunc) are autovectorized, I was able to patch
the resulting assembler code, replacing the vector op-codes (vroundps to
vsqrtps, also adapting the argument pattern), and found that the resulting
binary was significantly faster and worked as intended (I exemplarily did this
for 'sqrt' on my AVX2 system and got about 400% speedup). So my guess is that
the autovectorization opportunity is simply missed - the code structure to
produce assembler code for the given loop pattern is obviously there and
functioning. The compiler does indeed state it is unable to vectorize. I was
using this test code:
#include <cmath>
extern float data [ 32768 ] ;
extern void vf1()
{
#pragma vectorize enable
for ( int i = 0 ; i < 32768 ; i++ )
data [ i ] = std::sqrt ( data [ i ] ) ;
}
and this compiler call:
clang++ -fvectorize -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize
-std=c++11 -O3 -mavx2 -S -o sqrt.s sqrt_gcc.cc
resulting in these diagnosic messages:
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:464:12:
remark: loop not vectorized: call instruction cannot be vectorized
[-Rpass-analysis=loop-vectorize]
{ return __builtin_sqrtf(__x); }
^
sqrt_gcc.cc:14:3: remark: loop not vectorized: read with atomic ordering or
volatile read [-Rpass-analysis=loop-vectorize]
for ( int i = 0 ; i < 32768 ; i++ )
using e.g. 'trunc' instead of 'sqrt' vectorizes correctly.
I did find an old thread here complaining about this behaviour:
http://clang-developers.42468.n3.nabble.com/Bug-with-vectorization-of-transcendental-functions-tc4041229.html#a4041291
but it seems that there was no conclusion, so I am submitting this bug report,
hoping to revive the topic.
With regards
Kay F. Jahnke
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190109/69fe6330/attachment.html>
More information about the llvm-bugs
mailing list