[all-commits] [llvm/llvm-project] db0d6e: [mlir][arith] Support wide integer multiplication ...

Fri Sep 16 09:04:26 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: db0d6e567df3d34584be349347e357123246759d
      https://github.com/llvm/llvm-project/commit/db0d6e567df3d34584be349347e357123246759d
  Author: Jakub Kuderski <kubak at google.com>
  Date:   2022-09-16 (Fri, 16 Sep 2022)

  Changed paths:
    M mlir/lib/Dialect/Arithmetic/Transforms/EmulateWideInt.cpp
    A mlir/test/Dialect/Arithmetic/emulate-wide-int-very-wide.mlir
    M mlir/test/Dialect/Arithmetic/emulate-wide-int.mlir
    A mlir/test/Integration/Dialect/Arithmetic/CPU/test-wide-int-emulation-muli-i16.mlir

  Log Message:
  -----------
  [mlir][arith] Support wide integer multiplication emulation

Emulate multiplication by splitting each input element of type i2N into 4
digits of type iN and bit width i(N/2). This is so that the intermediate
multiplications and additions do not overflow. We extract these i(N/2)
digits from iN vector elements by masking (low digit) and shifting right
(high digit).

The multiplication algorithm used is the standard (long) multiplication.
Multiplying two i2N integers produces (at most) a i4N result, but because
the calculation of top i2N is not necessary, we omit it.
In total, this implementations performs 10 intermediate multiplications
and 16 additions. The number of multiplications could be decreased by
switching to a more efficient algorithm like Karatsuba. This would,
however, require being able to perform (intermediate) wide additions and
subtractions, so it is not clear that such implementation would be more
efficient.

I tested this on all 16-bit inut pairs, when emulating i16 with i8.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D133629