[LLVMbugs] [Bug 4637] New: Some FP optimization opportunities missed
bugzilla-daemon at cs.uiuc.edu
bugzilla-daemon at cs.uiuc.edu
Tue Jul 28 00:19:48 PDT 2009
http://llvm.org/bugs/show_bug.cgi?id=4637
Summary: Some FP optimization opportunities missed
Product: new-bugs
Version: unspecified
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P2
Component: new bugs
AssignedTo: unassignedbugs at nondot.org
ReportedBy: edwintorok at gmail.com
CC: llvmbugs at cs.uiuc.edu
The following portion of code from Blender's fluid simulation takes up ~20-25%
of fluid simulation time (with both gcc/llvm). It looks like we are missing
some optimization opportunities here:
void bar(float *y);
void foo(float rho, float uy, float ux, float usqr, float uz) {
float lcsmeq[19];
lcsmeq[1 ] = ( (1.0/18.0)*(rho + uy*(4.5*uy + 3.0) - usqr)) ;
lcsmeq[2 ] = ( (1.0/18.0)*(rho + uy*(4.5*uy - 3.0) - usqr)) ;
lcsmeq[3 ] = ( (1.0/18.0)*(rho + ux*(4.5*ux + 3.0) - usqr)) ;
lcsmeq[4 ] = ( (1.0/18.0)*(rho + ux*(4.5*ux - 3.0) - usqr)) ;
lcsmeq[5 ] = ( (1.0/18.0)*(rho + uz*(4.5*uz + 3.0) - usqr)) ;
lcsmeq[6 ] = ( (1.0/18.0)*(rho + uz*(4.5*uz - 3.0) - usqr)) ;
lcsmeq[7] = ( (1.0/36.0)*(rho + (+ux+uy)*(4.5*(+ux+uy) + 3.0) - usqr));
lcsmeq[8] = ( (1.0/36.0)*(rho + (-ux+uy)*(4.5*(-ux+uy) + 3.0) - usqr));
lcsmeq[9] = ( (1.0/36.0)*(rho + (+ux-uy)*(4.5*(+ux-uy) + 3.0) - usqr));
lcsmeq[10] = ( (1.0/36.0)*(rho + (-ux-uy)*(4.5*(-ux-uy) + 3.0) - usqr));
lcsmeq[11] = ( (1.0/36.0)*(rho + (+uy+uz)*(4.5*(+uy+uz) + 3.0) - usqr));
lcsmeq[12] = ( (1.0/36.0)*(rho + (+uy-uz)*(4.5*(+uy-uz) + 3.0) - usqr));
lcsmeq[13] = ( (1.0/36.0)*(rho + (-uy+uz)*(4.5*(-uy+uz) + 3.0) - usqr));
lcsmeq[14] = ( (1.0/36.0)*(rho + (-uy-uz)*(4.5*(-uy-uz) + 3.0) - usqr));
lcsmeq[15] = ( (1.0/36.0)*(rho + (+ux+uz)*(4.5*(+ux+uz) + 3.0) - usqr));
lcsmeq[16] = ( (1.0/36.0)*(rho + (+ux-uz)*(4.5*(+ux-uz) + 3.0) - usqr));
lcsmeq[17] = ( (1.0/36.0)*(rho + (-ux+uz)*(4.5*(-ux+uz) + 3.0) - usqr));
lcsmeq[18] = ( (1.0/36.0)*(rho + (-ux-uz)*(4.5*(-ux-uz) + 3.0) - usqr));
bar(lcsmeq);
}
Attached is the .bc file we produce, and the assembly has these:
cvtss2sd %xmm1, %xmm5
movsd .LCPI1_0, %xmm6
movapd %xmm5, %xmm7
mulsd %xmm6, %xmm7
movsd .LCPI1_1, %xmm8
movapd %xmm7, %xmm9
addsd %xmm8, %xmm9
mulsd %xmm5, %xmm9
cvtss2sd %xmm0, %xmm0
addsd %xmm0, %xmm9
cvtss2sd %xmm3, %xmm3
subsd %xmm3, %xmm9
movsd .LCPI1_2, %xmm10
mulsd %xmm10, %xmm9
cvtsd2ss %xmm9, %xmm9
movss %xmm9, 16(%rsp)
movsd .LCPI1_3, %xmm9
addsd %xmm9, %xmm7
mulsd %xmm5, %xmm7
addsd %xmm0, %xmm7
subsd %xmm3, %xmm7
mulsd %xmm10, %xmm7
...
I see several opportunities to optimize here:
1) rho-usqr is computed each time:
%13 = fadd double %0, %12 ; <double> [#uses=1]
%14 = fsub double %13, %6 ; <double> [#uses=1]
%15 = fmul double %14, 0x3FAC71C71C71C71C ; <double>
[#uses=1]
%16 = fptrunc double %15 to float ; <float> [#uses=1]
%22 = fadd double %0, %21 ; <double> [#uses=1]
%23 = fsub double %22, %6 ; <double> [#uses=1]
%24 = fmul double %23, 0x3FAC71C71C71C71C ; <double>
[#uses=1]
%25 = fptrunc double %24 to float
Instead a temporary could be introduced that stores the result of (rho-usqr).
FADD/FSUB isn't associative, so this transform is not safe in general (unless
maybe there is unsafe-fpmath?), but in this case the result is truncated to a
float, so I think the following is sufficient to guarantee same results:
Calculate the rounding error introduced by applying associativity to fadd/fsub,
it can only be the last bit of the mantissa, now apply any operations that are
done on the result (in this case multiply by 1/18), and convert to float. If
the result is zero, then we can apply associativity since it only changes bits
of the double's mantissa that get truncated anyway.
2) There are lots of floating point extensions of the fsub:
%60 = fsub float %uy, %ux ; <float> [#uses=1]
%61 = fpext float %60 to double ; <double> [#uses=2]
But uy and uy are already available in extended form:
%1 = fpext float %uy to double ; <double> [#uses=3]
%18 = fpext float %ux to double ; <double> [#uses=3]
%61 could be calculated as:
%61 = fsub double %1, %18
Or would that violate some IEEE FP rules?
3) This sort of code is a very good candidate for vectorization, since it is
the exact same operations applied to different operands
If any of my above optimizations are still unsafe, then maybe we should do them
at least for -ffast-math, or -ffinite-math-only -fno-trapping-math
-fno-signaling-nans
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list