[llvm] [AMDGPU] In promote-alloca, if index is dynamic, sandwich load with bitcasts to reduce number of extractelements as they have large expansion in the backend. (PR #171253)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 9 07:13:06 PST 2025
ruiling wrote:
> > Vector element extract/insert needs to operate on element basis, like in the example:
> > ```
> > %alloca = alloca [32 x i16], align 16, addrspace(5)
> > %gep = getelementptr i8, ptr addrspace(5) %alloca, i32 0, i32 %idx
> > %load = load <8 x i16>, ptr addrspace(5) %gep, align 1
> > ```
> > As we don't know whether `%gep` will be aligned to 16. You cannot bitcast `%alloca` to <4 x i128> and translate the `<8 x i16>` load into one `extractelement`. So you need alignment check to lower this way. For the unaligned case, I feel @arsenm's point is let's optimize the register allocator part or other ways to simplify the IR. I think you can only handle aligned case properly in this change. Better we have some tests for the unaligned case.
>
> I believe GEPToVectorIndex is only allowing GEPs that map to the vector index. I will add an unaligned testcase to make sure.
I think GEPToVectorIndex is used to get the index into the vector type translated from the alloca. By saying unaligned, I really mean not aligned to the vector type you just bitcasted to temporarily. Like in below case, the alloca was translated to <32 x i16>. but in order to translate the `load <8 x i16>` into the simplified form. You bitcast it to <4 x i128>. Unalignment means the case that the address of `%gep` is not aligned to 128bit.
```
%alloca = alloca [32 x i16], align 16, addrspace(5)
%gep = getelementptr i16, ptr addrspace(5) %alloca, i32 0, i32 %idx
%load = load <8 x i16>, ptr addrspace(5) %gep, align 2
```
Hope I understand everything correctly.
https://github.com/llvm/llvm-project/pull/171253
More information about the llvm-commits
mailing list