<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Apr 30, 2020, at 15:09, Frank Winter via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">From LLVM IR, how can you get the 'workgroup size' value?<br class="">It seems to be set by the AMDGPU backend as metadata since in AMDGPUMetadata.h there are things defined like<br class=""><br class="">constexpr char ReqdWorkGroupSize[] = "ReqdWorkGroupSize";<br class=""><br class="">and<br class=""><br class="">struct Metadata final {<br class=""> /// 'reqd_work_group_size' attribute. Optional.<br class=""> std::vector<uint32_t> mReqdWorkGroupSize = std::vector<uint32_t>();<br class=""> ...<br class="">}<br class=""><br class="">Is this metadata set to the kernel function or to the module?<br class=""><br class="">What IR instructions would give access to the value of, say, the workgroup size in dimension x?<br class=""><br class=""><br class="">Frank<br class=""><br class=""></div></blockquote></div><div><br class=""></div><div>The code object metadata is only for statically known workgroup size information The metadata you found here corresponds to !reqd_work_group_size, corresponding to the OpenCL attribute of the same name. We have a variety of other static attributes useful related to workgroup sizes, as documented here: <a href="https://llvm.org/docs/AMDGPUUsage.html#llvm-ir-attributes" class="">https://llvm.org/docs/AMDGPUUsage.html#llvm-ir-attributes</a>. The "uniform-work-group-size” (corresponding to the OpenCL flag -cl-uniform-work-group-size) may also be of interest.</div><div><br class=""></div><div>Dynamically, there isn’t a single instruction to get the group size and it depends on the runtime/driver how to implement it. You need to get a pointer to somewhere, and load from it. For HSA/ROCm, these are loaded from an ABI struct pointed to by a special kernel input SGPR. Recently the core implementation was moved into clang builtin so we can annotate the load with !range metadata: <a href="https://github.com/llvm/llvm-project/blob/a1bd5cd539f9e2fd34e522b848e751342985e882/clang/lib/CodeGen/CGBuiltin.cpp#L13985" class="">https://github.com/llvm/llvm-project/blob/a1bd5cd539f9e2fd34e522b848e751342985e882/clang/lib/CodeGen/CGBuiltin.cpp#L13985</a>. You can see how these are used here: <a href="https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/ockl/src/workitem.cl" class="">https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/ockl/src/workitem.cl</a></div><div><br class=""></div><div>-Matt</div></body></html>