<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/72517>72517</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AMDGPU] Kernel hangs when compiled with code-object version 5 due to insufficient stack
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
jhuber6
</td>
</tr>
</table>
<pre>
The `libc` test suite currently cannot be updated to code-object version 5 because of an observed hang while calling global constructors. The following LLVM-IR also in https://godbolt.org/z/9j9TWPfaK causes issues only when the `amdgpu_code_object_version` metadata is set to `500` and when optimizations are turned on. This is taken from the kernel in `libc` that simply iterates the `__init_array_end` and `__init_array_start` array and invokes function pointers. Note that this issue is present even when the loop body is never executed.
```llvm
; ModuleID = 'image.0.5.precodegen.bc'
source_filename = "ld-temp.o"
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8"
target triple = "amdgcn-amd-amdhsa"
@__init_array_end = external hidden addrspace(1) global [0 x i64], align 8
@__init_array_start = external hidden addrspace(1) global [0 x i64], align 8
; Function Attrs: mustprogress
define protected amdgpu_kernel void @_begin(i32 noundef %argc, ptr noundef %argv, ptr noundef %env) local_unnamed_addr #0 {
entry:
br i1 icmp eq (i64 sub (i64 ptrtoint (ptr addrspacecast (ptr addrspace(1) @__init_array_end to ptr) to i64), i64 ptrtoint (ptr addrspacecast (ptr addrspace(1) @__init_array_start to ptr) to i64)), i64 0), label %exit, label %for.body.preheader.i
for.body.preheader.i: ; preds = %entry
%sub.ptr.div.i = ashr exact i64 sub (i64 ptrtoint (ptr addrspacecast (ptr addrspace(1) @__init_array_end to ptr) to i64), i64 ptrtoint (ptr addrspacecast (ptr addrspace(1) @__init_array_start to ptr) to i64)), 3
%umax.i = tail call i64 @llvm.umax.i64(i64 %sub.ptr.div.i, i64 1)
br label %for.body.i
for.body.i: ; preds = %for.body.i, %for.body.preheader.i
%i.04.i = phi i64 [ %inc.i, %for.body.i ], [ 0, %for.body.preheader.i ]
%arrayidx.i = getelementptr inbounds [0 x i64], ptr addrspace(1) @__init_array_start, i64 0, i64 %i.04.i
%0 = load i64, ptr addrspace(1) %arrayidx.i, align 8, !tbaa !8
%1 = inttoptr i64 %0 to ptr
tail call void %1(i32 noundef %argc, ptr noundef %argv, ptr noundef %env) #4
%inc.i = add nuw i64 %i.04.i, 1
%exitcond.not.i = icmp eq i64 %inc.i, %umax.i
br i1 %exitcond.not.i, label %exit, label %for.body.i, !llvm.loop !12
exit: ; preds = %for.body.i, %entry
ret void
}
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i64 @llvm.umax.i64(i64, i64) #1
attributes #0 = { mustprogress "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
attributes #2 = { mustprogress nofree norecurse nounwind willreturn memory(argmem: readwrite) "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
attributes #3 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
attributes #4 = { nobuiltin }
!opencl.ocl.version = !{!0, !0, !0}
!llvm.ident = !{!1, !2, !1, !2, !1, !2}
!llvm.module.flags = !{!3, !4, !5, !6, !7}
!0 = !{i32 2, i32 0}
!1 = !{!"clang version 18.0.0"}
!2 = !{!"clang version 16.0.0"}
!3 = !{i32 1, !"amdgpu_code_object_version", i32 500}
!4 = !{i32 1, !"wchar_size", i32 4}
!5 = !{i32 8, !"PIC Level", i32 1}
!6 = !{i32 1, !"ThinLTO", i32 0}
!7 = !{i32 1, !"EnableSplitLTOUnit", i32 1}
!8 = !{!9, !9, i64 0}
!9 = !{!"long", !10, i64 0}
!10 = !{!"omnipotent char", !11, i64 0}
!11 = !{!"Simple C++ TBAA"}
!12 = distinct !{!12, !13}
!13 = !{!"llvm.loop.mustprogress"}
```
The issue I have found is caused by the emission of the `.private_segment_fixed_size` kernel metadata being incorrectly set to zero after optimizations. If I take the GCN and manually edit the metadata to set `private_segment_fixed_size` to a non-zero value the kernel no longer hangs. To reproduce, use the following command line invocation:
```
$ clang bad.ll --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -O1 -c; llvm-readelf --notes bad.o | grep 'private_segment'
```
If you change the `amdgpu_code_object_version` value to instead be `400` in the source, the issue goes away as the stack is no-longer set to zero.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzcWEtzo7ry_zTKpgsKhLGdRRZO8s-p1JnzqP_k3Lt0CWhjzQiJKwkSz6e_1QIc_MicWUydxZ2aIIP6-etWS2rhnKw14h3L71n-eCM6vzf27su-K9AubwpTHe5e9ghsmShZlGyZgEfnwXXSI5Sdtai9OkAptDYeCoSurYTHCryB0lQYmeILlh56tE4aDTkUWIrOIZgdgNBgCoe2xwr2QtfwupcKoRRKSV1DrUwhFJRGO2-70hvrYiBzdkYp80oknz7967fo-f9BKGdAath73zqWbRh_YvypNlVhlI-NrRl_-sb40-2X25d__7kTv0KwwoF0rkMHRqsDvO5Rgx_cFU1Vt92WfNgOPmxHHwiEBr2ohBcgHTj05C1bJnmS0KTQ1SDKtF428pvw0mgHwiL4zmqswGjyQ5J28OIrathZ0wTVX9FqVOTKHPO98OBk06oDSI9WeHSTodut1NJvhbXisEVdTRacTzkvrA-T9BpIpO7NV3Sw63RJNkJrpPZIKP9uPA5q_WCn65CsbS061B6wR_2OlzKmBUoWItHYowV8w7LzWMUseWTJZnwuk-G_Un0zfsru4TdTdQqfH4Flj8D4SjaixjiJ87i1SBGoUcdFyfhq4HGmsyVud1KhFg2ObFxVkcemjQ3jfCD0wtbogSKlxMF0fiLFqGXZZrkIj6hN5y-cZZtseERtNn9ZzMny-cxy_rJi2SZdJizb8Hx5fNLMmmb4-IzkJKtPl4El6vlioOwnYf1iPdLcLke2Pr3lg9SoPyqI-jylr3nKoz5Nghwaop4nQQQNkQ5il4voc8ajTR79kkZasmxDFq_PUfNWtuoILq2HUkeiqehv78SRfHwuLjIxsOKbR6uFgr2sKtQgqsq6VpTI-Dpl_HZa5Cy_T-ANCJP8kfEHEErWGtbXpYdk_pnyj7n4NK2FjfeWKgk0nfOtNbVF5wa6CndSI7TWeCyp2I3FYly7vZEVkMUF1lIzvpYZB206XeEOGM-FrUuyoPX27HN_5TPqntxQphRq22lK-GpLXgLjWQJsdT8YhdrbA1W-8AZQWJApyLJpAf8DZMVyAa4rpp-tt56WO72TyiNwpXCXHyc0r4XZGxJGs94EgPkt-fHzlAzRvqrmqCkZfytRoAq4vUl_8mFnbEw1iorKHkWFNpbz6F-dzzZAWdFarNy4FPIB6RFmxnPXFXHrbVzJPpaBSLg9VUBRevjfRT2bQdA14m303Qupwg4eTGGLUOvjgYDYAwznqE2WkyGzBL6M3fWIhTj92L-LaM6E8Ifv50lwVcbJYnS13cvByfw-zOjyUoiEsd4QUfIdFYHuXU2IgawmVGv0qLBB7Sl2UhdUINxlVfvxyM4WzvDj6NvMiiQoV0ZUQ-w_UjC3d15cg7upL4SgcT2TnAbJUntvgkeDAcmUbyPhezINRZXn6U-qp4xni3lUKXbD2q0q0N3rGST8AdIZORWX0ugq1saPfFOlnfhmyTAk_2lhvhDyg8VrlJmGVRXOXYyn6clWHJivFa6LVD-pZBZ9QHkUtXr8-61R06akClF-BW12FhG0cQddBrxfpa7AtVh2SnhRKIRXqZRFOgNDg42xB8bX2mg8LvoKS0XH5I8rx5isYwTTuYnCeyuLjo7Gw85ITq_uT_ZvOsloE3kr2lbqOmqE39NBJntknHvbIeOcaJwX5ddo3OCNjYput0MbOfkNj-TrQDvBdKo9PWr_eRB9pItf9_SozWLZWYfvCq_pELZusKGYWhTVq5V-UPnPwJX9sAun0H0XrX_G9MUs0kUnlaeb29na4alpUZcqNqWKp5vwsCZTOr_xdNwYZuNRxLjUZUXXrhOmdCTm4_jx-7mwJly44p0StTuVmY08i3HMx3E5jqtL35KZBKrMQT39OHEiPdXDOC8V3fgnONJ1nMQJQT5j4n_HtLzClJ3ZM8EwXmI-uNTzo9l0i5-JW3ws7rXcC7udUmVkX8yZ8zPm9Tvzn88P8Al7VDPedM67_Fjxy17qTy9_zDhPTF59zPl_msrM51ZJ_-nlj780bTPX1a9Psb8dZcwO3DPi24tAKaPrUTRFP7nKdpI7A59ptGyNp2QndGci0usiLhPrs2zo4vrA-D3j9_Byv9mcpciYWJV0XurSz5bUUVt2Qp9d-jftv_HJ_XCuZ2p2zNfLyx7HZsoz7EWPsKOTCUg39KMqKA6ho4KNdCHFzW5q9MStlb3wuHVY00Fwu5NvWA3pt0ymxtGxM1Wg1DVIXRprsfTqMDWqvqE1IHYe7WmDKobnHTyHflRQ-cvD76FL1AjdCaUOgJX0Yeaow5sglC2T79vmDQjQRkdBdy9Uh_NmlzZA6YI2NAFdDC8GLLbWVB2dMR-gcwP9e9-vNE1Dxim6iEvdmzI4cbz_noPPFzCUjkJUsVIQRUOXg2WPF50NiJqy7Vj2WO_e0iRLIPojhaikIxBFPaIdEtUOokgb2gVIpAG2eoDaIp3HVmdoHBtXV1PieQcH01G26xp_rP04ImhAaudRVFAEpsXQf5RDU27okhF8_ph0tUEH4lUcQAwNxLD9ha6dicYYzNIkvqnusuo2uxU3eJeukiRd5DzJb_Z3WbHALEuy5XpV4grXRYFitRYFX1Q8Wab8Rt7xhGdpmi7TRZbxPOaLFSbVMi0w4-scl2yRYCOkisNKMra-CSberXierm7CwdeFtjTnGl8H-2l55Y839i7EoehqR6dE6bx7l-KlV6Gfvfnt8Zc__2L5I_w6pFlIrqFpWZqmlQrpBOH3HzSqqyPE3W4nS0klKcB101l1d9Zoln7fFXFpGsafQnNzGOhgQWIZfxoazYw_BQf_GwAA__819AWx">