[all-commits] [llvm/llvm-project] 7bc9d9: [AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to l...

Chaitanya via All-commits all-commits at lists.llvm.org
Sun Aug 25 20:29:47 PDT 2024


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 7bc9d95b7e5a58c6acd65d96df065235641e0c3c
      https://github.com/llvm/llvm-project/commit/7bc9d95b7e5a58c6acd65d96df065235641e0c3c
  Author: Chaitanya <Krishna.Sankisa at amd.com>
  Date:   2024-08-26 (Mon, 26 Aug 2024)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPU.h
    M llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
    A llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
    M llvm/lib/Target/AMDGPU/CMakeLists.txt
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-dynamic-indirect-access-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-dynamic-indirect-access.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-dynamic-lds-test-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-dynamic-lds-test.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-multi-static-dynamic-indirect-access-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-multi-static-dynamic-indirect-access.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-multiple-blocks-return-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-multiple-blocks-return.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-dynamic-indirect-access-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-dynamic-indirect-access.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-dynamic-lds-test-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-dynamic-lds-test.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-function-param-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-function-param.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-nested-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-nested.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-lds-test-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-lds-test-atomic-cmpxchg-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-lds-test-atomicrmw-asan.ll
    A llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-lds-test.ll

  Log Message:
  -----------
  [AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. (#87265)

This PR introduces new pass "amdgpu-sw-lower-lds". 

This pass lowers the local data store, LDS, uses in kernel and
non-kernel functions in module to use dynamically allocated global
memory. Packed LDS Layout is emulated in the global memory.
The lowered memory instructions from LDS to global memory are then
instrumented for address sanitizer, to catch addressing errors.
This pass only work when address sanitizer has been enabled and has
instrumented the IR. It identifies that IR has been instrumented using
"nosanitize_address" module flag.

For a kernel, LDS access can be static or dynamic which are direct
(accessed within kernel) and indirect (accessed through non-kernels).

**Replacement of Kernel LDS accesses:** 
- All the LDS accesses corresponding to kernel will be packed together,
where all static LDS accesses will be allocated first and then dynamic
LDS follows. The total size with alignment is calculated. A new LDS
global will be created for the kernel called "SW LDS" and it will have
the attribute "amdgpu-lds-size" attached with value of the size
calculated. All the LDS accesses in the module will be replaced by GEP
with offset into the "Sw LDS".
- A new "llvm.amdgcn.<kernel>.dynlds" is created per kernel accessing
the dynamic LDS. This will be marked used by kernel and will have
MD_absolue_symbol metadata set to total static LDS size, Since dynamic
LDS allocation starts after all static LDS allocation.

- A device global memory equal to the total LDS size will be allocated.
At the prologue of the kernel, a single work-item from the work-group,
does a "malloc" and stores the pointer of the allocation in "SW LDS". To
store the offsets corresponding to all LDS accesses, another global
variable is created which will be called "SW LDS metadata" in this pass.

- **SW LDS:** 
It is LDS global of ptr type with name
"llvm.amdgcn.sw.lds.<kernel-name>".

- **SW LDS Metadata:** 
It is of struct type, with n members. n equals the number of LDS globals
accessed by the kernel(direct and indirect). Each member of struct is
another struct of type {i32, i32, i32}. First member corresponds to
offset, second member corresponds to size of LDS global being replaced
and third represents the total aligned size. It will have name
"llvm.amdgcn.sw.lds.<kernel-name>.md". This global will have an
intializer with static LDS related offsets and sizes initialized. But
for dynamic LDS related entries, offsets will be intialized to previous
static LDS allocation end offset. Sizes for them will be zero initially.
These dynamic LDS offset and size values will be updated with in the
kernel, since kernel can read the dynamic LDS size allocation done at
runtime with query to "hidden_dynamic_lds_size" hidden kernel argument.

- At the epilogue of kernel, allocated memory would be made free by the
same single work-item.

**Replacement of non-kernel LDS accesses:** 
- Multiple kernels can access the same non-kernel function. All the
kernels accessing LDS through non-kernels are sorted and assigned a
kernel-id. All the LDS globals accessed by non-kernels are sorted.

- This information is used to build two tables: 
- **Base table:** 
Base table will have single row, with elements of the row placed as per
kernel ID. Each element in the row corresponds to ptr of "SW LDS"
variable created for that kernel.

- **Offset table:** 
Offset table will have multiple rows and columns. Rows are assumed to be
from 0 to (n-1). n is total number of kernels accessing the LDS through
non-kernels. Each row will have m elements. m is the total number of
unique LDS globals accessed by all non-kernels. Each element in the row
correspond to the ptr of the replacement of LDS global done by that
particular kernel.

- A LDS variable in non-kernel will be replaced based on the information
from base and offset tables. Based on kernel-id query, ptr of "SW LDS"
for that corresponding kernel is obtained from base table. The Offset
into the base "SW LDS" is obtained from corresponding element in offset
table. With this information, replacement value is obtained.



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list