[Mlir-commits] [llvm] [mlir] [MLIR][AMDGPU] Adding dynamic size check to avoid subword buffer load (PR #135014)

Mon Apr 14 08:36:22 PDT 2025

================
@@ -141,6 +243,7 @@ struct TransferReadLowering final : OpRewritePattern<vector::TransferReadOp> {
 void mlir::amdgpu::populateAmdgpuTransferReadToLoadPatterns(
     RewritePatternSet &patterns) {
   patterns.add<TransferReadLowering>(patterns.getContext());
+  vector::populateVectorTransferLoweringPatterns(patterns);
----------------
jerryyin wrote:

I can go either ways. If I don't go with this pattern then the lowering of masked transfer_read will become:

```mlir
%0 = scf.if %false -> (vector<4xf32>) {
  %1 = vector.transfer_read %arg0[%arg1, %arg1], %cst_0, %arg2 {amdgpu.buffer_transfer_read_needs_mask, in_bounds = [true]} : memref<8x8xf32, #amdgpu.address_space<fat_raw_buffer>>, vector<4xf32>
  scf.yield %1 : vector<4xf32>
} else {
  %1 = vector.load %arg0[%arg1, %arg1] : memref<8x8xf32, #amdgpu.address_space<fat_raw_buffer>>, vector<4xf32>
  %2 = arith.select %arg2, %1, %cst : vector<4xi1>, vector<4xf32>
  scf.yield %2 : vector<4xf32>
}
```

As a MLIR developer (from the reader perspective) I'm used to see the op get rewritten away in `OpRewritePattern` but not a new op of the same look populated after this pass. On the other hand, making the developer to invoke the next pass on their own pipeline, as you indicated, also make sense. What do you think?

https://github.com/llvm/llvm-project/pull/135014