[libc-commits] [libc] [libc] Update the AMDGPU implementation to use code object 5 (PR #72580)
Joseph Huber via libc-commits
libc-commits at lists.llvm.org
Thu Nov 16 18:02:55 PST 2023
================
@@ -222,17 +250,15 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
// Set up the packet for exeuction on the device. We currently only launch
// with one thread on the device, forcing the rest of the wavefront to be
// masked off.
- uint16_t setup = (1 + (params.num_blocks_y * params.num_threads_y != 1) +
- (params.num_blocks_z * params.num_threads_z != 1))
- << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS;
+ uint16_t setup = (dims) << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS;
packet->workgroup_size_x = params.num_threads_x;
packet->workgroup_size_y = params.num_threads_y;
packet->workgroup_size_z = params.num_threads_z;
packet->reserved0 = 0;
packet->grid_size_x = params.num_blocks_x * params.num_threads_x;
packet->grid_size_y = params.num_blocks_y * params.num_threads_y;
packet->grid_size_z = params.num_blocks_z * params.num_threads_z;
- packet->private_segment_size = private_size;
+ packet->private_segment_size = !dynamic_stack ? private_size : 16834;
----------------
jhuber6 wrote:
If the `dynamic_stack` is set then I think the private segment is just zero and the runtime is expected to set it like here. I'm pretty sure the expectation is that `dynamic_stack` only applies if the `private_segment` couldn't be determined.
https://github.com/llvm/llvm-project/pull/72580
More information about the libc-commits
mailing list