[llvm] [mlir] [MLIR][AMDGPU] Adding dynamic size check to avoid subword buffer load (PR #135014)

Wed Apr 9 08:11:14 PDT 2025

jerryyin wrote:

Ok that's quite a few points... Would be happy to discuss offline but below are my understanding:

> I'm not convinced this sort of dynamic control flow is the way to go here

I can't think of any better approach on MLIR side that can apply unconditionally to either small toy example or large matmul example, so a dynamic check seems necessary. If you come up with better LLVM implementations that can help avoid this additional overhead, we can revert this PR.

> This only applies to sub-32-bit element types

Good point, I can skip when the element type >= a word

> The accesses has to be aligned to less than 4 bytes or the number of elements on the buffer isn't known to be a multiple of 4

Even if the alignment >= 4 bytes and buffer is a multiple of 4 the boundary condition can still be triggered if we read from in the middle of a word? 

> I think negative starting indices might be allowed

I thought the conclusion we had the other day is to avoid negative offsets as much as possible? I see this as slightly off topic and needs broader discussion/approval.

https://github.com/llvm/llvm-project/pull/135014