[all-commits] [llvm/llvm-project] 1a7f5f: [AMDGPU] Promote nestedGEP allocas to vectors (#14...
Harrison Hao via All-commits
all-commits at lists.llvm.org
Mon Jun 2 01:20:37 PDT 2025
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 1a7f5f58332d91f88a4305399d7f79aba046e19a
https://github.com/llvm/llvm-project/commit/1a7f5f58332d91f88a4305399d7f79aba046e19a
Author: Harrison Hao <57025411+harrisonGPU at users.noreply.github.com>
Date: 2025-06-02 (Mon, 02 Jun 2025)
Changed paths:
M llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
M llvm/test/CodeGen/AMDGPU/amdpal.ll
A llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep-of-gep.ll
Log Message:
-----------
[AMDGPU] Promote nestedGEP allocas to vectors (#141199)
Supports the `nestedGEP`pattern that
appears when an alloca is first indexed as an array element and then
shifted with a byte‑offset GEP:
```llvm
%SortedFragments = alloca [10 x <2 x i32>], addrspace(5), align 8
%row = getelementptr [10 x <2 x i32>], ptr addrspace(5) %SortedFragments, i32 0, i32 %j
%elt1 = getelementptr i8, ptr addrspace(5) %row, i32 4
%val = load i32, ptr addrspace(5) %elt1
```
The pass folds the two levels of addressing into a single vector lane
index and keeps the whole object in a VGPR:
```llvm
%vec = freeze <20 x i32> poison ; alloca promote <20 x i32>
%idx0 = mul i32 %j, 2 ; j * 2
%idx = add i32 %idx0, 1 ; j * 2 + 1
%val = extractelement <20 x i32> %vec, i32 %idx
```
This eliminates the scratch read.
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list