[libc-commits] [libc] [libc] Fix flipped AMDGPU kernel launch arguments (PR #83648)

Joseph Huber via libc-commits libc-commits at lists.llvm.org
Fri Mar 1 18:53:06 PST 2024


https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/83648

Summary:
These values were incorrectly flipped, setting the size of the blocks to
the threads and vice-versa. When I originally wrote the thread utilities
it was using COV4 which used an implicit format. Then when I updated I
accidentally flipped them and never noticed because nothing depended on
the size of the threads until I checked it manually.


>From 0dcccede4fc5179ede5c64c546ffba4f99901377 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Fri, 1 Mar 2024 20:50:24 -0600
Subject: [PATCH] [libc] Fix flipped AMDGPU kernel launch arguments

Summary:
These values were incorrectly flipped, setting the size of the blocks to
the threads and vice-versa. When I originally wrote the thread utilities
it was using COV4 which used an implicit format. Then when I updated I
accidentally flipped them and never noticed because nothing depended on
the size of the threads until I checked it manually.
---
 libc/utils/gpu/loader/amdgpu/Loader.cpp | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libc/utils/gpu/loader/amdgpu/Loader.cpp b/libc/utils/gpu/loader/amdgpu/Loader.cpp
index 0ff2dce813ed2b..e3911eda2bd82a 100644
--- a/libc/utils/gpu/loader/amdgpu/Loader.cpp
+++ b/libc/utils/gpu/loader/amdgpu/Loader.cpp
@@ -230,12 +230,12 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
   implicit_args_t *implicit_args = reinterpret_cast<implicit_args_t *>(
       reinterpret_cast<uint8_t *>(args) + sizeof(args_t));
   implicit_args->grid_dims = dims;
-  implicit_args->grid_size_x = params.num_threads_x;
-  implicit_args->grid_size_y = params.num_threads_y;
-  implicit_args->grid_size_z = params.num_threads_z;
-  implicit_args->workgroup_size_x = params.num_blocks_x;
-  implicit_args->workgroup_size_y = params.num_blocks_y;
-  implicit_args->workgroup_size_z = params.num_blocks_z;
+  implicit_args->grid_size_x = params.num_blocks_x;
+  implicit_args->grid_size_y = params.num_blocks_y;
+  implicit_args->grid_size_z = params.num_blocks_z;
+  implicit_args->workgroup_size_x = params.num_threads_x;
+  implicit_args->workgroup_size_y = params.num_threads_y;
+  implicit_args->workgroup_size_z = params.num_threads_z;
 
   // Obtain a packet from the queue.
   uint64_t packet_id = hsa_queue_add_write_index_relaxed(queue, 1);



More information about the libc-commits mailing list