[Mlir-commits] [mlir] [MLIR][affine] Fix for #115849 Illegal affine loop fusion with vector types (PR #117617)

Tue Dec 3 16:17:26 PST 2024

brod4910 wrote:

The big issue here is that when we are able to validate that the load/store can be fused, the IR the transformation is still invalid. For example:

```mlir
func.func @should_fuse_across_memref_store_load_bounds() {
  %a = memref.alloc() : memref<64x512xf32> 
  %b = memref.alloc() : memref<64x512xf32>
  %c = memref.alloc() : memref<64x512xf32> 
  %d = memref.alloc() : memref<64x4096xf32>
  %e = memref.alloc() : memref<64x4096xf32>

  affine.for %j = 0 to 8 {
    %lhs = affine.vector_load %a[0, %j * 64] : memref<64x512xf32>, vector<64x64xf32>
    %rhs = affine.vector_load %b[0, %j * 64] : memref<64x512xf32>, vector<64x64xf32>
    %res = arith.addf %lhs, %rhs : vector<64x64xf32>
    affine.vector_store %res, %c[0, %j * 64] : memref<64x512xf32>, vector<64x64xf32>
  }

  affine.for %j = 0 to 8 {
    %lhs = affine.vector_load %c[0, 0] : memref<64x512xf32>, vector<64x32xf32>
    %rhs = affine.vector_load %d[0, %j * 32] : memref<64x4096xf32>, vector<64x32xf32>
    %res = arith.subf %lhs, %rhs : vector<64x32xf32>
    affine.vector_store %res, %d[0, %j * 32] : memref<64x4096xf32>, vector<64x32xf32>
  }

  return
}
```

Produces this invalid IR:
```mlir
  func.func @should_fuse_across_memref_store_load_bounds() {
    %alloc = memref.alloc() : memref<1x1xf32>
    %c0 = arith.constant 0 : index
    %alloc_0 = memref.alloc() : memref<64x512xf32>
    %alloc_1 = memref.alloc() : memref<64x512xf32>
    %alloc_2 = memref.alloc() : memref<64x4096xf32>
    %alloc_3 = memref.alloc() : memref<64x4096xf32>
    affine.for %arg0 = 0 to 8 {
      %0 = affine.vector_load %alloc_0[0, %c0 * 64] : memref<64x512xf32>, vector<64x64xf32>
      %1 = affine.vector_load %alloc_1[0, %c0 * 64] : memref<64x512xf32>, vector<64x64xf32>
      %2 = arith.addf %0, %1 : vector<64x64xf32>
      affine.vector_store %2, %alloc[0, 0] : memref<1x1xf32>, vector<64x64xf32>
      %3 = affine.vector_load %alloc[0, 0] : memref<1x1xf32>, vector<64x32xf32>
      %4 = affine.vector_load %alloc_2[0, %arg0 * 32] : memref<64x4096xf32>, vector<64x32xf32>
      %5 = arith.subf %3, %4 : vector<64x32xf32>
      affine.vector_store %5, %alloc_2[0, %arg0 * 32] : memref<64x4096xf32>, vector<64x32xf32>
    }
    return
  }
}
```

The private alloc `%alloc` created is `memref<1x1xf32> instead of the proper size.

https://github.com/llvm/llvm-project/pull/117617