[PATCH] D21127: Remove redundant FMUL in Newton-Raphson SQRT code

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Sat Jun 11 15:03:38 PDT 2016


spatel added subscribers: MatzeB, qcolombet.
spatel added a comment.

In http://reviews.llvm.org/D21127#455050, @n.bozhenov wrote:

> As for tests, I can easily add to `x86/sqrt-fastmath.ll` another RUN
>  with `-mattr=fma -recip=all:2` to check both FMA and two steps
>  refinement, but I'm not sure if it is a good idea. I believe
>  `sqrt-fastmath.ll` is overly fragile now. It can be easily broken by
>  perfectly valid variations in register allocation, code ordering or code
>  reassociation. And I'm afraid that additional tests for 2 steps
>  refinement will make the test even more fragile.
>
> So, I wonder if it would be better to split SQRT testing logically in
>  two parts:
>
> 1. check that NR is applied iff necessary (lit-tests)
> 2. check that NR code calculates roots with expected accuracy (execution test)
>
>   In other words, my suggestion is to convert the accuracy testers from PR21385 into new tests for LLVM test-suite instead of adding more fragile lit-tests here. What do you think?


I think that adding an accuracy test to test-suite is a great idea. And I agree that these tests are susceptible to RA/scheduler changes, but the benefit of exact testing has usually outweighed the maintenance cost in my experience.

So I have another, hopefully better, test suggestion. :)

How about a MIR test? (cc'ing Matthias and Quentin for any MIR test suggestions)

The RUN line would include something like:
$ llc -o - -stop-after machine-scheduler rsqrt.ll  -mattr=avx2,fma -recip=sqrtf

And then we can check the output closely for:

  %1 = VRSQRTSSr undef %2, %0
  %4 = VMULSSrr %1, %1
  %4 = VFMADDSSr213m %4, %0, %rip, 1, _, %const.0, _ :: (load 4 from constant-pool)
  %5 = VMULSSrm %1, %rip, 1, _, %const.1, _ :: (load 4 from constant-pool)
  %11 = VMULSSrr %5, %0
  %7 = VMULSSrr %4, %11
  %8 = FsFLD0SS
  %9 = VCMPSSrr %0, %8, 0
  %10 = VFsANDNPSrr %9, %7

This will be immune to RA and other machine-level variations.


http://reviews.llvm.org/D21127





More information about the llvm-commits mailing list