[cfe-dev] food for optimizer developers
Robert Purves
listrp at gmail.com
Wed Aug 11 03:18:20 PDT 2010
Douglas Gregor wrote:
>>> I wrote a Fortran to C++ conversion program that I used to convert selected
>>> LAPACK sources. Comparing runtimes with different compilers I get:
>>>
>>> absolute relative
>>> ifort 11.1.072 1.790s 1.00
>>> gfortran 4.4.4 2.470s 1.38
>>> g++ 4.4.4 2.922s 1.63
>>> clang++ 2.8 (trunk 108205) 6.487s 3.62
>>
>>> - Why is the code generated by clang++ so much slower than the g++ code?
>>
>> A "hot spot" in your benchmark dsyev_test.cpp is this loop in dlasr()
>>
>> FEM_DO(i, 1, m) {
>> temp = a(i, j + 1);
>> a(i, j + 1) = ctemp * temp - stemp * a(i, j);
>> a(i, j) = stemp * temp + ctemp * a(i, j);
>> }
>>
>> For the loop body, g++ (4.2) emits unsurprising code.
>>
>> clang++ (2.8) misses major optimizations accessing the 'a' array, and makes no less than 3 laborious address calculations.
>>
>> Presumably clang++, in its present state of development, is not smart enough to notice the underlying simple sequential access pattern, when the array is declared
>> arr_ref<double, 2> a
>
> This would make a *wonderful* bug report against the LLVM optimizer... http://llvm.org/bugs/ :)
I believe that would require the cooperation of the OP, because it is his Fortran -> C++ converter. Are you interested, Ralf?
I've started the ball rolling with a much reduced test case.
cat test.cpp
/*
Background:
<http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-August/010258.html>
Relevant files, including benchmark dsyev_test.cpp:
<http://cci.lbl.gov/lapack_fem/>
This file (test.cpp) is a reduced case of dsyev_test.cpp.
It sheds light on the performance issue with clang++.
$ clang++ -c -I. -O3 test.cpp -save-temps
Examine test.s, in which the two inner loops of interest
are easily identified by their 'subsd' instruction.
Contrary to expectation, assembly code for loops A and B
is different. Loop B contains laborious and redundant
address calculations.
clang --version
clang version 2.8 (trunk 110653)
By contrast, g++ (4.2) emits identical assembler for loops A and B.
*/
#include <fem/major_types.hpp>
namespace lapack_dsyev_fem {
using namespace fem::major_types;
void
test(
int x,
int const& m,
int const& n,
arr_cref<double> c,
arr_cref<double> s,
arr_ref<double, 2> a,
int const& lda)
{
c(dimension(star));
s(dimension(star));
a(dimension(lda, star));
int i, j;
double ctemp, stemp, temp;
if ( x ) {
for ( j = m - 1; j >= 1; j-- ) {
ctemp = c(j);
stemp = s(j);
// loop A, identical with loop B below
for ( i = 1; i <= n; i++ ) {
temp = a(j + 1, i);
a(j + 1, i) = ctemp * temp - stemp * a(j, i);
a(j, i) = stemp * temp + ctemp * a(j, i);
}
}
}
else {
for ( j = m - 1; j >= 1; j-- ) {
ctemp = c(j);
stemp = s(j);
// loop B, identical with loop A above
for ( i = 1; i <= n; i++ ) {
temp = a(j + 1, i);
a(j + 1, i) = ctemp * temp - stemp * a(j, i);
a(j, i) = stemp * temp + ctemp * a(j, i);
}
}
}
}
} // namespace lapack_dsyev_fem
Robert P.
More information about the cfe-dev
mailing list