[libc-commits] [libc] [libc] Update the AMDGPU implementation to use code object 5 (PR #72580)

Tue Nov 21 05:11:05 PST 2023

================
@@ -222,17 +250,16 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
   // Set up the packet for exeuction on the device. We currently only launch
   // with one thread on the device, forcing the rest of the wavefront to be
   // masked off.
-  uint16_t setup = (1 + (params.num_blocks_y * params.num_threads_y != 1) +
-                    (params.num_blocks_z * params.num_threads_z != 1))
-                   << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS;
+  uint16_t setup = (dims) << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS;
   packet->workgroup_size_x = params.num_threads_x;
   packet->workgroup_size_y = params.num_threads_y;
   packet->workgroup_size_z = params.num_threads_z;
   packet->reserved0 = 0;
   packet->grid_size_x = params.num_blocks_x * params.num_threads_x;
   packet->grid_size_y = params.num_blocks_y * params.num_threads_y;
   packet->grid_size_z = params.num_blocks_z * params.num_threads_z;
-  packet->private_segment_size = private_size;
+  packet->private_segment_size =
+      dynamic_stack ? 16 * 1024 /* 16 KB */ : private_size;
----------------
JonChesterfield wrote:

maybe write `enum {stack_size_default = 16*1024} or similar instead of the comment

https://github.com/llvm/llvm-project/pull/72580