[llvm] AMDGPU: share LDS budget logic and add experimental LDS buffering pass (PR #166388)

Tue Dec 2 08:46:49 PST 2025

yxsamliu wrote:

> I don't understand this transform; the load/store forwarding optimization already happens in this example and this folds to an empty function: https://godbolt.org/z/xqhc77q8c

The test needs to be more complicated to reproduce the issue. A real example is in rocrand benchmark rocrand-device-lfsr113-log-normal-float-default (https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocrand/benchmark/benchmark_rocrand_device_api.cpp#L56), where `states` is read, partially updated, then written back. `data` is also updated. Due to potential aliasing between `states` and `data`, compiler cannot eliminate read and write to `states`. adding LDS buffering for `states` alleviates memory contention and showed perf gains.

https://github.com/llvm/llvm-project/pull/166388