[PATCH] D53927: [AArch64] Enable libm vectorized functions via SLEEF

Francesco Petrogalli via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Nov 9 20:56:06 PST 2018


fpetrogalli added a comment.

Hi @steleman,

sorry for the delay in getting back to you. I have a couple of observations,  for you and @shibatch.

1. @steleman I don't understand some of the values in your benchmarks. In particular, sin and cos should have similar timings, not differ so much as in your report. I wonder whether the choice of the CLOCK_PROCESS_CPUTIME_ID might have caused this. I think that CLOCK_PROCESS_CPUTIME_ID might translate in a syscall, and therefore cause much overhead in the measurement. I'd rather use CLOCK_MONOTONIC. Also, to make sure you are just measuring the function latency, I think you should invoke the benchmark on array of smaller size, and invoke the call a couple of times before actually starting the time measurement, to reduce the amount of noise causes by warm up effects.

2. @shibatch, do you know for sure that we shouldn't use the 3.5 ULP version of the function? I have just run the following example with both -O3 and -Ofast, they both produce the same SVML calls. As far as I know  SVML guarantee a 4ULP precision, so I think we are good in doing SLEEF. @shibatch, so you have any advice here?

  // compile with "clang -fveclib=SVML ~/test.c -o - -c -S -emit-llvm [-Ofast|-O3]"
  #include <math.h>
  
  void f(double * restrict x, double * restrict y, unsigned n) {
    int i;
    for (i = 0; i < n; ++i) {
      x[i]=sin(y[i]);
    }
  }

On a final note, you'll have to excuse again delays in my replies in the coming week, because of SC18. If you will be there too, it would be great to meet.

Kind regards,

Francesco


https://reviews.llvm.org/D53927





More information about the llvm-commits mailing list