[libc-commits] [libc] [libc] Update the AMDGPU implementation to use code object 5 (PR #72580)

Matt Arsenault via libc-commits libc-commits at lists.llvm.org
Thu Nov 16 17:13:08 PST 2023


================
@@ -222,17 +250,15 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
   // Set up the packet for exeuction on the device. We currently only launch
   // with one thread on the device, forcing the rest of the wavefront to be
   // masked off.
-  uint16_t setup = (1 + (params.num_blocks_y * params.num_threads_y != 1) +
-                    (params.num_blocks_z * params.num_threads_z != 1))
-                   << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS;
+  uint16_t setup = (dims) << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS;
   packet->workgroup_size_x = params.num_threads_x;
   packet->workgroup_size_y = params.num_threads_y;
   packet->workgroup_size_z = params.num_threads_z;
   packet->reserved0 = 0;
   packet->grid_size_x = params.num_blocks_x * params.num_threads_x;
   packet->grid_size_y = params.num_blocks_y * params.num_threads_y;
   packet->grid_size_z = params.num_blocks_z * params.num_threads_z;
-  packet->private_segment_size = private_size;
+  packet->private_segment_size = !dynamic_stack ? private_size : 16834;
----------------
arsenm wrote:

This is backwards, the assume dynamic access is 16k thing was a backend hack that doesn't require runtime replication. Also that's unnecessary in v5 anyway, so just use the raw size?

Also we really need some unit tests that stress dynamic alloca somewhere 

https://github.com/llvm/llvm-project/pull/72580


More information about the libc-commits mailing list