[PATCH] SSE reciprocal square root instruction latencies
Simon Pilgrim
llvm-dev at redking.me.uk
Tue Sep 16 10:46:25 PDT 2014
Hi spatel, andreadb, rob.lougher, qcolombet,
The SSE rsqrt instruction is a fast reciprocal square estimate (typically <5 cycles) but is currently grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very slow) SSE sqrt instruction (often >20 cycles). For code which uses rsqrt (possibly with newton-raphson iterations) this poor scheduling is affecting performance.
This patch splits off the rsqrt instruction from the sqrt instruction scheduling classes and creates new IIC_SSE_RSQRT* classes with latency values based on Agner's tables. The latencies/pipelines for supported x86 targets end up being the same as the rcp(ss,ps) instruction but I've kept them separate.
There is a proposal for a fast-math optimization to use rsqrt + nr (http://llvm.org/bugs/show_bug.cgi?id=20900) which would benefit from this as well.
Note - for the Haswell scheduler I've updated the base model but not altered any of the exceptions/overrides.
http://reviews.llvm.org/D5370
Files:
lib/Target/X86/X86InstrSSE.td
lib/Target/X86/X86SchedHaswell.td
lib/Target/X86/X86SchedSandyBridge.td
lib/Target/X86/X86Schedule.td
lib/Target/X86/X86ScheduleAtom.td
lib/Target/X86/X86ScheduleBtVer2.td
lib/Target/X86/X86ScheduleSLM.td
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D5370.13758.patch
Type: text/x-patch
Size: 6874 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140916/722e6189/attachment.bin>
More information about the llvm-commits
mailing list