[cfe-dev] food for optimizer developers
Robert Purves
listrp at gmail.com
Tue Aug 10 03:59:07 PDT 2010
> I wrote a Fortran to C++ conversion program that I used to convert selected
> LAPACK sources. Comparing runtimes with different compilers I get:
>
> absolute relative
> ifort 11.1.072 1.790s 1.00
> gfortran 4.4.4 2.470s 1.38
> g++ 4.4.4 2.922s 1.63
> clang++ 2.8 (trunk 108205) 6.487s 3.62
> - Why is the code generated by clang++ so much slower than the g++ code?
A "hot spot" in your benchmark dsyev_test.cpp is this loop in dlasr()
FEM_DO(i, 1, m) {
temp = a(i, j + 1);
a(i, j + 1) = ctemp * temp - stemp * a(i, j);
a(i, j) = stemp * temp + ctemp * a(i, j);
}
For the loop body, g++ (4.2) emits unsurprising code.
loop:
movsd (%rcx), %xmm2
movapd %xmm3, %xmm0
mulsd %xmm2, %xmm0
movapd %xmm4, %xmm1
mulsd (%rax), %xmm1
subsd %xmm1, %xmm0
movsd %xmm0, (%rcx)
movapd %xmm3, %xmm0
mulsd (%rax), %xmm0
mulsd %xmm4, %xmm2
addsd %xmm2, %xmm0
movsd %xmm0, (%rax)
incl %esi
addq $8, %rcx
addq $8, %rax
cmpl %esi, +0(%r13)
jge loop
clang++ (2.8) misses major optimizations accessing the 'a' array, and makes no less than 3 laborious address calculations.
loop:
movq %rax, %rdi
subq %rdx, %rdi
imulq %r14, %rdi
subq %rcx, %rdi
addq %rsi, %rdi
movq +0(%r13), %r8
movsd (%r8, %rdi, 8), %xmm3
mulsd %xmm1, %xmm3
movq %rbx, %rdi
subq %rdx, %rdi
imulq %r14, %rdi
subq %rcx, %rdi
addq %rsi, %rdi
movsd (%r8, %rdi, 8), %xmm4
movapd %xmm2, %xmm5
mulsd %xmm4, %xmm5
subsd %xmm3, %xmm5
movsd %xmm5, (%r8, %rdi, 8)
movq +32(%r13), %rdx
movq %rax, %rdi
subq %rdx, %rdi
movq +0(%r13), %r8
movq +8(%r13), %r14
imulq %r14, %rdi
movq +24(%r13), %rcx
subq %rcx, %rdi
addq %rsi, %rdi
movsd (%r8, %rdi, 8), %xmm3
mulsd %xmm2, %xmm3
mulsd %xmm1, %xmm4
addsd %xmm3, %xmm4
movsd %xmm4, (%r8, %rdi, 8)
incq %rsi
cmpl (%r15), %esi
jle loop
Presumably clang++, in its present state of development, is not smart enough to notice the underlying simple sequential access pattern, when the array is declared
arr_ref<double, 2> a
I think clang has no trouble optimizing properly for arrays like this:
double a[800][800];
Robert P.
More information about the cfe-dev
mailing list