[all-commits] [llvm/llvm-project] 2de936: [mlir][vector] Fix emulation of "narrow" type `vec...

Thu Apr 24 10:06:03 PDT 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 2de936b6eb38e7a37224a97c2a22aa79b9dfb9dc
      https://github.com/llvm/llvm-project/commit/2de936b6eb38e7a37224a97c2a22aa79b9dfb9dc
  Author: Andrzej Warzyński <andrzej.warzynski at arm.com>
  Date:   2025-04-24 (Thu, 24 Apr 2025)

  Changed paths:
    M mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp
    M mlir/test/Dialect/Vector/vector-emulate-narrow-type-unaligned.mlir
    M mlir/test/Dialect/Vector/vector-emulate-narrow-type.mlir

  Log Message:
  -----------
  [mlir][vector] Fix emulation of "narrow" type `vector.store` (#133231)

Below are two examples of "narrow" `vector.stores`. The first example
  does not require partial stores and hence no RMW stores. This is
  currently emulated correctly.
  ```mlir
  func.func @example_1(%arg0: vector<4xi2>) {
      %0 = memref.alloc() : memref<13xi2>
      %c4 = arith.constant 4 : index
      vector.store %arg0, %0[%c4] : memref<13xi2>, vector<4xi2>
      return
  }
  ```

  The second example requires a partial (and hence RMW) store due to the
  offset pointing outside the emulated type boundary (`%c3`).
  ```mlir
  func.func @example_2(%arg0: vector<4xi2>) {
      %0 = memref.alloc() : memref<13xi2>
      %c3 = arith.constant 3 : index
      vector.store %arg0, %0[%c3] : memref<13xi2>, vector<4xi2>
      return
  }
  ```

  This is currently incorrectly emulated as a single "full" store (note
  that the offset is incorrect) instead of partial stores:
  ```mlir
  func.func @example_2(%arg0: vector<4xi2>) {
    %alloc = memref.alloc() : memref<4xi8>
    %0 = vector.bitcast %arg0 : vector<4xi2> to vector<1xi8>
    %c0 = arith.constant 0 : index
    vector.store %0, %alloc[%c0] : memref<4xi8>, vector<1xi8>
    return
  }
  ```

  The incorrect emulation stems from this simplified (i.e. incomplete)
  calculation of the front padding:
  ```cpp
      std::optional<int64_t> foldedNumFrontPadElems =
          isDivisibleInSize ? 0
: getConstantIntValue(linearizedInfo.intraDataOffset);
  ```

  Since `isDivisibleInSize` is `true` (i8 / i2 = 4):
    * front padding is set to `0` and, as a result,
    * the input offset (`%c3`) is ignored, and
    * we incorrectly assume that partial stores won't be needed.

  Note that in both examples we are storing `vector<4xi2>` into
  `memref<13xi2>` (note _different_ trailing dims) and hence partial
  stores might in fact be required. The condition above is updated to:
  ```cpp
      std::optional<int64_t> foldedNumFrontPadElems =
          (isDivisibleInSize && trailingDimsMatch)
              ? 0
              : getConstantIntValue(linearizedInfo.intraDataOffset);
  ```

  This change ensures that the input offset is properly taken into
  account, which fixes the issue. It doesn't affect `@example1`.

  Additional comments are added to clarify the current logic.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications