[PATCH] SSE reciprocal square root instruction latencies

Tue Sep 16 10:46:25 PDT 2014

Hi spatel, andreadb, rob.lougher, qcolombet,

The SSE rsqrt instruction is a fast reciprocal square estimate (typically <5 cycles) but is currently grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very slow) SSE sqrt instruction (often >20 cycles). For code which uses rsqrt (possibly with newton-raphson iterations) this poor scheduling is affecting performance.

This patch splits off the rsqrt instruction from the sqrt instruction scheduling classes and creates new IIC_SSE_RSQRT* classes with latency values based on Agner's tables. The latencies/pipelines for supported x86 targets end up being the same as the rcp(ss,ps) instruction but I've kept them separate.

There is a proposal for a fast-math optimization to use rsqrt + nr (http://llvm.org/bugs/show_bug.cgi?id=20900) which would benefit from this as well.

Note - for the Haswell scheduler I've updated the base model but not altered any of the exceptions/overrides.

http://reviews.llvm.org/D5370

Files:
  lib/Target/X86/X86InstrSSE.td
  lib/Target/X86/X86SchedHaswell.td
  lib/Target/X86/X86SchedSandyBridge.td
  lib/Target/X86/X86Schedule.td
  lib/Target/X86/X86ScheduleAtom.td
  lib/Target/X86/X86ScheduleBtVer2.td
  lib/Target/X86/X86ScheduleSLM.td
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D5370.13758.patch
Type: text/x-patch
Size: 6874 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140916/722e6189/attachment.bin>