[llvm] [AMDGPU] In promote-alloca, if index is dynamic, sandwich load with bitcasts to reduce number of extractelements as they have large expansion in the backend. (PR #171253)
via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 8 23:17:50 PST 2025
ruiling wrote:
> @ruiling do you know if the alignment check is strictly required here? As I understand, once alloca is promoted to vector, there is no need for alignment for extracting subvector.
Vector element extract/insert needs to operate on element basis, like in the example:
```
%alloca = alloca [32 x i16], align 16, addrspace(5)
%gep = getelementptr i8, ptr addrspace(5) %alloca, i32 0, i32 %idx
%load = load <8 x i16>, ptr addrspace(5) %gep, align 1
```
As we don't know whether `%gep` will be aligned to 16. You cannot bitcast `%alloca` to <4 x i128> and translate the `<8 x i16>` load into one `extractelement`. So you need alignment check to lower this way. For the unaligned case, I feel @arsenm's point is let's optimize the register allocator part or other ways to simplify the IR. I think you can only handle aligned case properly in this change. Better we have some tests for the unaligned case.
https://github.com/llvm/llvm-project/pull/171253
More information about the llvm-commits
mailing list