[PATCH] D94648: [amdgpu] Implement lower function LDS pass
Jon Chesterfield via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu May 13 06:06:53 PDT 2021
JonChesterfield added a comment.
Just run a bug to ground here. Replacing the inline asm with the donothing intrinsic is prettier, but also doesn't work as is. This is a flaw in the above only tested at the IR and application level. This shows up quickly (if opaquely) at the executable unit test scale.
Given:
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7"
target triple = "amdgcn-amd-amdhsa"
@global_barrier_state = hidden addrspace(3) global i32 undef, align 4
define i32 @rw() #0 {
entry:
%0 = atomicrmw add i32 addrspace(3)* @global_barrier_state, i32 1 acq_rel, align 4
ret i32 %0
}
define amdgpu_kernel void @__device_start() {
entry:
%0 = call i32 @rw()
ret void
}
attributes #0 = { noinline }
This transform does exactly what it was intended to, the LDS variable allocated at zero, but the kernel metadata starts:
.amdhsa_kernel __device_start
.amdhsa_group_segment_fixed_size 0 ; should be 4, isn't
.end_amdhsa_kernel
If the inline asm is reintroduced, that goes to 4. Similarly, if the test case that reduced to this is modified to allocate 4 bytes more LDS than the metadata asks for, it works again.
I suspect there is something in hardware that rounds LDS allocation up to a boundary, so as long as the kernel looks like it uses some non-zero amount of LDS, the out of bounds read hits in the allocated region.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94648/new/
https://reviews.llvm.org/D94648
More information about the llvm-commits
mailing list