[PATCH] Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385)

Sanjay Patel spatel at rotateright.com
Fri Nov 7 12:06:17 PST 2014


Hi hfinkel, andreadb, nadav,

This is a first step for generating SSE rcp instructions for reciprocal calcs when fast-math allows it. This is very similar to the rsqrt optimization enabled in D5658 ( http://reviews.llvm.org/rL220570 ).

For now, be conservative and only enable this for AMD btver2 where performance improves significantly both in terms of latency and throughput.

We may never enable this codegen for Intel Core* chips because the divider circuits are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21 cycle critical path for the rcp + mul + sub + mul + add estimate.

Follow-on patches may allow configuration of the number of Newton-Raphson refinement steps, add AVX512 support, and enable the optimization for more chips.

More background here: http://llvm.org/bugs/show_bug.cgi?id=21385

http://reviews.llvm.org/D6175

Files:
  lib/Target/X86/X86.td
  lib/Target/X86/X86ISelLowering.cpp
  lib/Target/X86/X86ISelLowering.h
  lib/Target/X86/X86Subtarget.h
  test/CodeGen/X86/recip-fastmath.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6175.15935.patch
Type: text/x-patch
Size: 6765 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141107/d78c4c7f/attachment.bin>


More information about the llvm-commits mailing list