[llvm] [AMDGPU][MISched] Allow memory ops of different base pointers to be clustered (PR #140674)
via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 21 08:36:02 PDT 2025
choikwa wrote:
I investigated into obtaining profiling data for this issue, but due to the constraints (Issue only observed on MI200 & rocprofv3 limitations), I am unable to provide the per-instruction stall reasons. I could only extract aggregate HW Counter stats, which I've attached. It would be nice to have profiling on MI300, but I don't have a testcase that exhibits this improvement.
|HW Counters|Base | ClusteredMemory | Diff%|
|---|---|---|---|
|arch_vgpr | 16 | 12 | -33%|
|accum_vgpr | 0 | 4 | 100%|
|VALUInsts | 1589.688 | 1427.688 | -11%|
|VALUBusy | 7.33375 | 5.791669 | -27%|
|WriteSize | 4.925987 | 6.588816 | 25%|
|SALUInsts | 287.1875 | 318.1875 | 10%|
|SALUBusy | 1.307518 | 1.28516 | -2%|
|L2CacheHit | 9.222125 | 7.20795 | -28%|
|MemUnitBusy | 79.31983 | 73.18036 | -8%|
|MemUnitStalled | 0.059394 | 0.046405 | -28%|
|AverageNs | 518102 | 587858 | 12%|
https://github.com/llvm/llvm-project/pull/140674
More information about the llvm-commits
mailing list