[PATCH] Fast-math fold: x / (y * sqrt(z)) -> x * rsqrt(z) / y

Sanjay Patel spatel at rotateright.com
Mon Oct 6 11:11:16 PDT 2014


Hi hfinkel, wschmidt, willschm,

This patch only affects PPC at the moment because no other target has enabled reciprocal sqrt estimate or reciprocal estimate optimizations yet.

The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c:
   float distance = sqrt(dx * dx + dy * dy + dz * dz);
   float mag = dt / (distance * distance * distance);

Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces:
      addis 3, 2, .LCPI4_2 at toc@ha
      lfs 4, .LCPI4_2 at toc@l(3)
      addis 3, 2, .LCPI4_1 at toc@ha
      lfs 0, .LCPI4_1 at toc@l(3)
      fcmpu 0, 1, 4
      beq 0, .LBB4_2
   # BB#1:
      frsqrtes 4, 1
      addis 3, 2, .LCPI4_0 at toc@ha
      lfs 5, .LCPI4_0 at toc@l(3)
      fnmsubs 13, 1, 5, 1
      fmuls 6, 4, 4
      fmadds 1, 13, 6, 5
      fmuls 1, 4, 1
      fres 4, 1                <--- reciprocal of reciprocal square root
      fnmsubs 1, 1, 4, 0
      fmadds 4, 4, 1, 4
   .LBB4_2:
      fmuls 1, 4, 2
      fres 2, 1
      fnmsubs 0, 1, 2, 0
      fmadds 0, 2, 0, 2
      fmuls 1, 3, 0
      blr

After the patch, this simplifies to:
      frsqrtes 0, 1
      addis 3, 2, .LCPI4_1 at toc@ha
      fres 5, 2
      lfs 4, .LCPI4_1 at toc@l(3)
      addis 3, 2, .LCPI4_0 at toc@ha
      lfs 7, .LCPI4_0 at toc@l(3)
      fnmsubs 13, 1, 4, 1
      fmuls 6, 0, 0
      fnmsubs 2, 2, 5, 7
      fmadds 1, 13, 6, 4
      fmadds 2, 5, 2, 5
      fmuls 0, 0, 1
      fmuls 0, 0, 2
      fmuls 1, 3, 0
      blr

I don't have any PPC hardware to measure this patch on (still no reply from gcc's CompileFarm), but I think it should be quite a bit faster just based on the number of flops saved. 

There should be a measurable perf win using the n-body program from test-suite or here:
http://benchmarksgame.alioth.debian.org/u32/performance.php?test=nbody
or using the test loop/program from:
http://llvm.org/bugs/show_bug.cgi?id=20900

http://reviews.llvm.org/D5628

Files:
  lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  test/CodeGen/PowerPC/recipest.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D5628.14462.patch
Type: text/x-patch
Size: 2387 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141006/48a8ef76/attachment.bin>


More information about the llvm-commits mailing list