[PATCH] Use rsqrt (X86) to speed up reciprocal square root calcs (PR20900)

Sanjay Patel spatel at rotateright.com
Fri Oct 10 15:59:04 PDT 2014


Any other suggestions/improvements?

Using this n-body benchmark program:
https://github.com/tycho/nbody

...on a btver2 system, I see excellent performance improvements.

Before:

  Running simulation with 16384 particles, crosscheck enabled, CPU enabled, 1 threads
  CPU_SOA:            2.10 GFLOPS
  CPU_SOA_tiled:   1.12 GFLOPS
  CPU_AOS:            0.64 GFLOPS
  CPU_AOS_tiled:   1.04 GFLOPS

After:

  Running simulation with 16384 particles, crosscheck enabled, CPU enabled, 1 threads
  CPU_SOA:           5.19 GFLOPS
  CPU_SOA_tiled:  5.34 GFLOPS
  CPU_AOS:           1.27 GFLOPS
  CPU_AOS_tiled:  1.59 GFLOPS

http://reviews.llvm.org/D5658






More information about the llvm-commits mailing list