[PATCH] Use rsqrt (X86) to speed up reciprocal square root calcs (PR20900)
Sanjay Patel
spatel at rotateright.com
Fri Oct 10 15:59:04 PDT 2014
Any other suggestions/improvements?
Using this n-body benchmark program:
https://github.com/tycho/nbody
...on a btver2 system, I see excellent performance improvements.
Before:
Running simulation with 16384 particles, crosscheck enabled, CPU enabled, 1 threads
CPU_SOA: 2.10 GFLOPS
CPU_SOA_tiled: 1.12 GFLOPS
CPU_AOS: 0.64 GFLOPS
CPU_AOS_tiled: 1.04 GFLOPS
After:
Running simulation with 16384 particles, crosscheck enabled, CPU enabled, 1 threads
CPU_SOA: 5.19 GFLOPS
CPU_SOA_tiled: 5.34 GFLOPS
CPU_AOS: 1.27 GFLOPS
CPU_AOS_tiled: 1.59 GFLOPS
http://reviews.llvm.org/D5658
More information about the llvm-commits
mailing list