[Mlir-commits] [mlir] [mlir][vector] Turn ExtractOpFromLoad into a canonicalization (PR #171198)

Mon Dec 8 14:53:11 PST 2025

https://github.com/krzysz00 requested changes to this pull request.

Ok, so I don't think, going off the definition of `vector.load`, this is canonical, and I'd argue it shouldn't be.

I'm going to link https://github.com/llvm/llvm-project/pull/134734 here, where a related issue with how partially out-of-bounds accesses are handled has been in a holding pattern for a while.

The current notes on `vector.load` re out-of-bounds values are
> Representation-wise, the ‘vector.load’ operation permits out-of-bounds reads. Support and implementation of out-of-bounds vector loads is target-specific. No assumptions should be made on the value of elements loaded out of bounds. Not all targets may support out-of-bounds vector loads.

This isn't the most concrete, but my read here is that "any `vector.load` where some vector element is out of bounds can have arbitrary behavior, not necessarily limited to the value of that element".

That is, targets where the behavior is "if any accessed element is out of bounds, the entire access is deemed out of bounds" is valid. Under that behavior, scoping down to an `extract` could change the result of an access.

And I do, in fact, have an example of these semantics.

On AMD, when loading more than a (32-bit) word of elements from a buffer resource, bounds checks are performed per-word with, critically, saturating arithmetic, and out-of-bounds loads return 0.

That is, suppose I have a resource containing `8 floats (32 bytes) [f0, f1 ..., f7], annotated with the correct bounds.

Then,
```
%e1 = call <4 x float> @llvm.andgcn.raw.ptr.buffer.load(rsrc, i32 6 * 4, ...)
```
will read to `%e1 = <f6, f7, 0, 0>`. Applying this pattern will lead to the element at issue or 0 being returned.

However, with
```llvm
%e2 = call <4 x float> @llvm.amdgcn.raw.buffer.load(rsrc, i32 -4, ...)
```
, I will **not** get the vector `<0, f0, f1, f2>`. I will instead get `<0, 0, 0, 0>`

Implementing the former behavior (currently needed for some Vulkan conformance) is, in the backend, "strict buffer OOB mode", and it implies substantial pessimizations of the generated code in order to get the correct behavior just in case someone did `vector.load %memref[%value_that_is_negative_1] : memref<?xf32>, vector<4xf32>`

Given that the traditional default is for this sort of scalarization to _not_ be performed, and that that's important for performance, I conclude that this sort of "propagating"/"poisonous" behavior for vector.load's out of bounds accesses is a valid implementation of `vector.load`, and therefore that this pattern isn't canonical.

(That is, you can do this rewrite ... if you have reason to believe you're in a context where out of bounds accesses either can't happen or are handled elementwise)

https://github.com/llvm/llvm-project/pull/171198