[all-commits] [llvm/llvm-project] 35553d: [mlir] Add polynomial approximation for vectorized...

Sat Oct 23 04:56:26 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 35553d452b32e9356352df8536fa0485207a9274
      https://github.com/llvm/llvm-project/commit/35553d452b32e9356352df8536fa0485207a9274
  Author: Emilio Cota <ecg at google.com>
  Date:   2021-10-23 (Sat, 23 Oct 2021)

  Changed paths:
    M mlir/include/mlir/Dialect/Math/Transforms/Passes.h
    M mlir/lib/Dialect/Math/Transforms/CMakeLists.txt
    M mlir/lib/Dialect/Math/Transforms/PolynomialApproximation.cpp
    M mlir/test/Dialect/Math/polynomial-approximation.mlir
    M mlir/test/lib/Dialect/Math/CMakeLists.txt
    M mlir/test/lib/Dialect/Math/TestPolynomialApproximation.cpp
    A mlir/test/mlir-cpu-runner/X86Vector/lit.local.cfg
    A mlir/test/mlir-cpu-runner/X86Vector/math_polynomial_approx_avx2.mlir
    M utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
    M utils/bazel/llvm-project-overlay/mlir/test/BUILD.bazel

  Log Message:
  -----------
  [mlir] Add polynomial approximation for vectorized math::Rsqrt

This patch adds a polynomial approximation that matches the
approximation in Eigen.

Note that the approximation only applies to vectorized inputs;
the scalar rsqrt is left unmodified.

The approximation is protected with a flag since it emits an AVX2
intrinsic (generated via the X86Vector). This is the only reasonably
clean way that I could find to generate the exact approximation that
I wanted (i.e. an identical one to Eigen's).

I considered two alternatives:

1. Introduce a Rsqrt intrinsic in LLVM, which doesn't exist yet.
   I believe this is because there is no definition of Rsqrt that
   all backends could agree on, since hardware instructions that
   implement it have widely varying degrees of precision.
   This is something that the standard could mandate, but Rsqrt is
   not part of IEEE754, so I don't think this option is feasible.

2. Emit fdiv(1.0, sqrt) with fast math flags to allow reciprocal
   transformations. Although portable, this doesn't allow us
   to generate exactly the code we want; it is the LLVM backend,
   and not MLIR, who controls what code is generated based on the
   target CPU.

Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D112192