[all-commits] [llvm/llvm-project] 1a7f5f: [AMDGPU] Promote nestedGEP allocas to vectors (#14...

Mon Jun 2 01:20:37 PDT 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 1a7f5f58332d91f88a4305399d7f79aba046e19a
      https://github.com/llvm/llvm-project/commit/1a7f5f58332d91f88a4305399d7f79aba046e19a
  Author: Harrison Hao <57025411+harrisonGPU at users.noreply.github.com>
  Date:   2025-06-02 (Mon, 02 Jun 2025)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
    M llvm/test/CodeGen/AMDGPU/amdpal.ll
    A llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep-of-gep.ll

  Log Message:
  -----------
  [AMDGPU] Promote nestedGEP allocas to vectors (#141199)

Supports the `nestedGEP`pattern that
 appears when an alloca is first indexed as an array element and then
 shifted with a byte‑offset GEP:

```llvm
  %SortedFragments = alloca [10 x <2 x i32>], addrspace(5), align 8
  %row  = getelementptr [10 x <2 x i32>], ptr addrspace(5) %SortedFragments, i32 0, i32 %j
  %elt1 = getelementptr i8, ptr addrspace(5) %row, i32 4
  %val  = load i32, ptr addrspace(5) %elt1
```

The pass folds the two levels of addressing into a single vector lane
 index and keeps the whole object in a VGPR:

```llvm
  %vec  = freeze <20 x i32> poison              ; alloca promote  <20 x i32>
  %idx0 = mul i32 %j, 2                         ; j * 2
  %idx  = add i32 %idx0, 1                      ; j * 2 + 1
  %val  = extractelement <20 x i32> %vec, i32 %idx
```

This eliminates the scratch read.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications