[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

Andy Kaylor via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Mar 10 14:53:07 PST 2022


andrew.w.kaylor added a comment.

This example illustrates the problem this patch intends to fix: https://godbolt.org/z/j445sxPMc

For Intel microarchitectures before Skylake, the LLVM cost model says that vector fsqrt is slow, so if fast-math is enabled, we'll use an approximation rather than the vsqrtps instruction when vectorizing a call to sqrtf(). If the code is compiled with -march=skylake or -mtune=skylake, we'll choose the vsqrtps instruction, but with any earlier base target, we'll choose the approximation even if there is a cpu_specific(skylake) implementation in the source code.

For example

  __attribute__((cpu_specific(skylake))) void foo(void) {
    for (int i = 0; i < 8; ++i)
      x[i] = sqrtf(y[i]);
  }

compiles to

  foo.b:
          vmovaps ymm0, ymmword ptr [rip + y]
          vrsqrtps        ymm1, ymm0
          vmulps  ymm2, ymm0, ymm1
          vbroadcastss    ymm3, dword ptr [rip + .LCPI2_0] # ymm3 = [-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0]
          vfmadd231ps     ymm3, ymm2, ymm1        # ymm3 = (ymm2 * ymm1) + ymm3
          vbroadcastss    ymm1, dword ptr [rip + .LCPI2_1] # ymm1 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
          vmulps  ymm1, ymm2, ymm1
          vmulps  ymm1, ymm1, ymm3
          vbroadcastss    ymm2, dword ptr [rip + .LCPI2_2] # ymm2 = [NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]
          vandps  ymm0, ymm0, ymm2
          vbroadcastss    ymm2, dword ptr [rip + .LCPI2_3] # ymm2 = [1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38]
          vcmpleps        ymm0, ymm2, ymm0
          vandps  ymm0, ymm0, ymm1
          vmovaps ymmword ptr [rip + x], ymm0
          vzeroupper
          ret

but it should compile to

  foo.b:
          vsqrtps ymm0, ymmword ptr [rip + y]
          vmovaps ymmword ptr [rip + x], ymm0
          vzeroupper
          ret


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410



More information about the cfe-commits mailing list