[LLVMbugs] [Bug 5501] New: Useless memory accesses removal from loops
bugzilla-daemon at cs.uiuc.edu
bugzilla-daemon at cs.uiuc.edu
Sun Nov 15 10:49:42 PST 2009
http://llvm.org/bugs/show_bug.cgi?id=5501
Summary: Useless memory accesses removal from loops
Product: new-bugs
Version: 2.6
Platform: PC
OS/Version: Windows XP
Status: NEW
Keywords: code-quality
Severity: normal
Priority: P2
Component: new bugs
AssignedTo: unassignedbugs at nondot.org
ReportedBy: bearophile at mailas.com
CC: llvmbugs at cs.uiuc.edu
While testing the LDC compiler I have seen that the SciMark2 benchmark
(http://math.nist.gov/scimark2/ ) shows two performance problems compared to
Java (Java is about 20-30% faster), here I have reduced one of them to almost
minimal C code:
#include "stdlib.h"
// Reduced Scimark2 SOR benchmark
void test(int N, double omega, double** G) {
double omega_over_four = omega * 0.25;
double one_minus_omega = 1.0 - omega;
int j, i = 1;
double* Gi = G[i];
double* Gim1 = G[i - 1];
double* Gip1 = G[i + 1];
for (j = 1; j < N - 1; j++)
Gi[j] = omega_over_four * (Gim1[j] + Gip1[j] + Gi[j - 1] + Gi[j + 1]) +
one_minus_omega * Gi[j];
}
int main() {
int N = 100;
double** mat = (double**)malloc(sizeof(double*) * N);
int i;
for (i = 0; i < N; i++)
mat[i] = (double*)malloc(sizeof(double) * N);
test(N, 1.25, mat);
return 0;
}
Inner loop of test(), using llvm-gcc 2.6 (32 bit, on Windows):
llvm-gcc -Wall -O3 -fomit-frame-pointer -msse3 -march=native -ffast-math
LBB1_2:
movapd %xmm1, %xmm2
mulsd 8(%ecx,%edi,8), %xmm2
movsd 8(%esi,%edi,8), %xmm3
addsd 8(%edx,%edi,8), %xmm3
addsd (%ecx,%edi,8), %xmm3
addsd 16(%ecx,%edi,8), %xmm3
mulsd %xmm0, %xmm3
addsd %xmm2, %xmm3
movsd %xmm3, 8(%ecx,%edi,8)
incl %edi
cmpl %eax, %edi
jne LBB1_2
I think the performance difference is caused by the 32bit Java server JIT that
reduces the number of memory accesses in the inner loop from 6 to 4, as in code
like (code not tested):
void test(int N, double omega, double** G) { // Scimark2 SOR reduced
double omega_over_four = omega * 0.25;
double one_minus_omega = 1.0 - omega;
int j, i = 1;
double* Gi = G[i];
double* Gim1 = G[i - 1];
double* Gip1 = G[i + 1];
double pred = Gi[0];
double curr = Gi[1];
double succ = Gi[2];
for (j = 1; j < N - 1; j++) {
pred = omega_over_four * (Gim1[j] + Gip1[j] + pred + succ) +
one_minus_omega * curr;
Gi[i] = pred;
curr = succ;
succ = Gi[i + 1];
}
}
On IRC <nicholas> has said:
hm! there's a number of loads that are trivially provably consequtive in
memory, but i don't think we have a pass that even tries to fold them
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list