[Mlir-commits] [mlir] [mlir][vector] Sink vector.extract/splat into load/store ops (PR #134389)

Mon Apr 14 10:52:01 PDT 2025

================
@@ -161,6 +161,20 @@ void populateVectorTransferCollapseInnerMostContiguousDimsPatterns(
 void populateSinkVectorOpsPatterns(RewritePatternSet &patterns,
                                    PatternBenefit benefit = 1);
 
+/// Patterns that remove redundant Vector Ops by merging them with load/store
+/// ops
+/// ```
+/// vector.load %arg0[%arg1] : memref<?xf32>, vector<4xf32>
+/// vector.extract %0[1] : f32 from vector<4xf32>
+/// ```
+/// Gets converted to:
+/// ```
+/// %c1 = arith.constant 1 : index
+/// %0 = arith.addi %arg1, %c1 overflow<nsw> : index
+/// %1 = memref.load %arg0[%0] : memref<?xf32>
----------------
dcaballe wrote:

Thanks @kuhar, those examples were helpful! I'm still kind of borderline but let’s move forward with this as an independent pattern. The proliferation of dangling “populate” methods is concerning but this case may be worth it. 

> The original load is efficient because you are accessing a full dword. However, if you turn it into memref.load ... : i8, you may no longer know, 

For that example, I would expect the alignment information to be explicit somewhere as `vector.load` doesn’t have any default alignment. In the presence of no alignment information, I’m still not sure this transformation is dropping information. 

> For example, the buffer instruction on amdgpu allow you to get a default value for any OOB accesses. Looking at the example above, it could be that only the last byte is OOB, but this alone makes the whole vector<4xi8> have the default value. If you no longer load that last byte, the access would be in-bounds and you would observe a different value. 

Yes but we can’t attribute hardware-specific semantics to `vector.load`. We allow OOB reads to accommodate those targets that can “handle” OOB accesses. However, we can’t make assumptions on what the target will do or the actual values of those OOB elements. Doc may need some refinement but we defined it along those lines: 

```
Representation-wise, the ‘vector.load’ operation permits out-of-bounds reads. Support and implementation of out-of-bounds vector loads is target-specific. No assumptions should be made on the value of elements loaded out of bounds. Not all targets may support out-of-bounds vector loads. 
```

A valid lowering of `vector.load` could be a scalarized version of it that is checking element by element if it’s OOB and only load in-bounds elements so the OOB accesses might not happen. I'd even say that OOB accesses are not observable as using the OOB elements should be poison, right? I think the behavior you are describing would better fit a masked vector load where the masked-off elements (OOB) are replaced with a padding value.

https://github.com/llvm/llvm-project/pull/134389