[PATCH] D21379: [X86] Heuristic to selectively build Newton-Raphson SQRT estimation

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 15 13:17:00 PDT 2016


spatel added a comment.

In http://reviews.llvm.org/D21379#458695, @n.bozhenov wrote:

> Below are some figures to justify the change.
>  Experimental Newton-Raphson efficiency for latency-bound code:
>
>   |      |  IVB |  HSW |  BDW |  SKL |
>   |------+------+------+------+------|
>   | x32  | -41% | -40% | -21% | -40% |
>   | x128 | -32% | -32% | -17% | -35% |
>   
>
> Experimental Newton-Raphson efficiency for throughput-bound code:
>
>   |      |  IVB |  HSW |  BDW |  SKL |
>   |------+------+------+------+------|
>   | x32  | +18% | +21% | -17% | -40% |
>   | x128 | +10% | +14% | +28% | -50% |
>   | x256 |      | +68% | +85% |  +3% |
>   




1. Shouldn't HSW show a latency improvement over IVB from using FMA?
2. How many N-R steps are included in your measurements?
3. Do the measurements include the change from http://reviews.llvm.org/D21127?

When we enabled the estimate generation code ( https://llvm.org/bugs/show_bug.cgi?id=21385#c32 ), we knew it had higher latency for SNB/IVB/HSW, but we reasoned that most real-world FP code would care more about throughput. This patch proposes to change that behavior for those targets (ie, favor latency at the expense of throughput). Do you have any benchmark numbers (test-suite, SPEC, etc) for those CPUs that shows a difference?

For the test file, please add RUNs that include the new attributes themselves rather than specifying a CPU. That way we'll have coverage for the expected behavior independently of any individual CPU.


http://reviews.llvm.org/D21379





More information about the llvm-commits mailing list