[PATCH] SSE reciprocal square root instruction latencies

Andrea Di Biagio Andrea_DiBiagio at sn.scee.net
Thu Sep 25 06:00:42 PDT 2014


Hi Simon,
Sorry for the late reply.

The patch looks good to me.
The changes to instruction latencies and the new instruction itineraries looks ok to me (I can see how latencies are based on Agner's table). However, I think it is better to get the final approval from somebody more familiar with the Intel scheduling models. For example, the change to X86ScheduleAtom.td should probably be reviewed by others.

As a side note: I have run some benchmarks using the compiler with/without your patch. Unfortunately I haven't seen any particular difference in the codegen. It turns out that most of our benchmarks I tried doesn't have good mix of sqrt/rsqrt. Also, as you said, under fastmath we lack of a rule for converting `sqrt+div to `rsqrt+mul` (PR20900). I am interested to see how this patch will improve things once PR20900 is fixed.

Thanks,
-Andrea

http://reviews.llvm.org/D5370






More information about the llvm-commits mailing list