[llvm-commits] [llvm] r153031 - /llvm/trunk/lib/Target/X86/README-SSE.txt

Sun Mar 18 17:43:35 PDT 2012

Author: d0k
Date: Sun Mar 18 19:43:34 2012
New Revision: 153031

URL: http://llvm.org/viewvc/llvm-project?rev=153031&view=rev
Log:
Add a note for -ffast-math optimization of vector norm.

Modified:
    llvm/trunk/lib/Target/X86/README-SSE.txt

Modified: llvm/trunk/lib/Target/X86/README-SSE.txt
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/README-SSE.txt?rev=153031&r1=153030&r2=153031&view=diff
==============================================================================

--- llvm/trunk/lib/Target/X86/README-SSE.txt (original)
+++ llvm/trunk/lib/Target/X86/README-SSE.txt Sun Mar 18 19:43:34 2012
@@ -922,3 +922,22 @@
 The insertps's of $0 are pointless complex copies.
 
 //===---------------------------------------------------------------------===//
+
+[UNSAFE FP]
+
+void foo(double, double, double);
+void norm(double x, double y, double z) {
+  double scale = __builtin_sqrt(x*x + y*y + z*z);
+  foo(x/scale, y/scale, z/scale);
+}
+
+We currently generate an sqrtsd and 3 divsd instructions. This is bad, fp div is
+slow and not pipelined. In -ffast-math mode we could compute "1.0/scale" first
+and emit 3 mulsd in place of the divs. This can be done as a target-independent
+transform.
+
+If we're dealing with floats instead of doubles we could even replace the sqrtss
+and inversion with an rsqrtss instruction, which computes 1/sqrt faster at the
+cost of reduced accuracy.
+
+//===---------------------------------------------------------------------===//