[PATCH] SSE reciprocal square root instruction latencies

Thu Sep 25 06:00:42 PDT 2014

Hi Simon,
Sorry for the late reply.

The patch looks good to me.
The changes to instruction latencies and the new instruction itineraries looks ok to me (I can see how latencies are based on Agner's table). However, I think it is better to get the final approval from somebody more familiar with the Intel scheduling models. For example, the change to X86ScheduleAtom.td should probably be reviewed by others.

As a side note: I have run some benchmarks using the compiler with/without your patch. Unfortunately I haven't seen any particular difference in the codegen. It turns out that most of our benchmarks I tried doesn't have good mix of sqrt/rsqrt. Also, as you said, under fastmath we lack of a rule for converting `sqrt+div to `rsqrt+mul` (PR20900). I am interested to see how this patch will improve things once PR20900 is fixed.

Thanks,
-Andrea

http://reviews.llvm.org/D5370