[libc-commits] [libc] 06ac828 - [libc] Fix flipped AMDGPU kernel launch arguments (#83648)

via libc-commits libc-commits at lists.llvm.org
Fri Mar 1 18:56:11 PST 2024


Author: Joseph Huber
Date: 2024-03-01T20:56:07-06:00
New Revision: 06ac828dc1076413b3c2649e9c1d3de33467d308

URL: https://github.com/llvm/llvm-project/commit/06ac828dc1076413b3c2649e9c1d3de33467d308
DIFF: https://github.com/llvm/llvm-project/commit/06ac828dc1076413b3c2649e9c1d3de33467d308.diff

LOG: [libc] Fix flipped AMDGPU kernel launch arguments (#83648)

Summary:
These values were incorrectly flipped, setting the size of the blocks to
the threads and vice-versa. When I originally wrote the thread utilities
it was using COV4 which used an implicit format. Then when I updated I
accidentally flipped them and never noticed because nothing depended on
the size of the threads until I checked it manually.

Added: 
    

Modified: 
    libc/utils/gpu/loader/amdgpu/Loader.cpp

Removed: 
    


################################################################################
diff  --git a/libc/utils/gpu/loader/amdgpu/Loader.cpp b/libc/utils/gpu/loader/amdgpu/Loader.cpp
index 0ff2dce813ed2b..e3911eda2bd82a 100644
--- a/libc/utils/gpu/loader/amdgpu/Loader.cpp
+++ b/libc/utils/gpu/loader/amdgpu/Loader.cpp
@@ -230,12 +230,12 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
   implicit_args_t *implicit_args = reinterpret_cast<implicit_args_t *>(
       reinterpret_cast<uint8_t *>(args) + sizeof(args_t));
   implicit_args->grid_dims = dims;
-  implicit_args->grid_size_x = params.num_threads_x;
-  implicit_args->grid_size_y = params.num_threads_y;
-  implicit_args->grid_size_z = params.num_threads_z;
-  implicit_args->workgroup_size_x = params.num_blocks_x;
-  implicit_args->workgroup_size_y = params.num_blocks_y;
-  implicit_args->workgroup_size_z = params.num_blocks_z;
+  implicit_args->grid_size_x = params.num_blocks_x;
+  implicit_args->grid_size_y = params.num_blocks_y;
+  implicit_args->grid_size_z = params.num_blocks_z;
+  implicit_args->workgroup_size_x = params.num_threads_x;
+  implicit_args->workgroup_size_y = params.num_threads_y;
+  implicit_args->workgroup_size_z = params.num_threads_z;
 
   // Obtain a packet from the queue.
   uint64_t packet_id = hsa_queue_add_write_index_relaxed(queue, 1);


        


More information about the libc-commits mailing list