[libc-commits] [libc] [libc] Fix flipped AMDGPU kernel launch arguments (PR #83648)
Joseph Huber via libc-commits
libc-commits at lists.llvm.org
Fri Mar 1 18:53:06 PST 2024
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/83648
Summary:
These values were incorrectly flipped, setting the size of the blocks to
the threads and vice-versa. When I originally wrote the thread utilities
it was using COV4 which used an implicit format. Then when I updated I
accidentally flipped them and never noticed because nothing depended on
the size of the threads until I checked it manually.
>From 0dcccede4fc5179ede5c64c546ffba4f99901377 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Fri, 1 Mar 2024 20:50:24 -0600
Subject: [PATCH] [libc] Fix flipped AMDGPU kernel launch arguments
Summary:
These values were incorrectly flipped, setting the size of the blocks to
the threads and vice-versa. When I originally wrote the thread utilities
it was using COV4 which used an implicit format. Then when I updated I
accidentally flipped them and never noticed because nothing depended on
the size of the threads until I checked it manually.
---
libc/utils/gpu/loader/amdgpu/Loader.cpp | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/libc/utils/gpu/loader/amdgpu/Loader.cpp b/libc/utils/gpu/loader/amdgpu/Loader.cpp
index 0ff2dce813ed2b..e3911eda2bd82a 100644
--- a/libc/utils/gpu/loader/amdgpu/Loader.cpp
+++ b/libc/utils/gpu/loader/amdgpu/Loader.cpp
@@ -230,12 +230,12 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
implicit_args_t *implicit_args = reinterpret_cast<implicit_args_t *>(
reinterpret_cast<uint8_t *>(args) + sizeof(args_t));
implicit_args->grid_dims = dims;
- implicit_args->grid_size_x = params.num_threads_x;
- implicit_args->grid_size_y = params.num_threads_y;
- implicit_args->grid_size_z = params.num_threads_z;
- implicit_args->workgroup_size_x = params.num_blocks_x;
- implicit_args->workgroup_size_y = params.num_blocks_y;
- implicit_args->workgroup_size_z = params.num_blocks_z;
+ implicit_args->grid_size_x = params.num_blocks_x;
+ implicit_args->grid_size_y = params.num_blocks_y;
+ implicit_args->grid_size_z = params.num_blocks_z;
+ implicit_args->workgroup_size_x = params.num_threads_x;
+ implicit_args->workgroup_size_y = params.num_threads_y;
+ implicit_args->workgroup_size_z = params.num_threads_z;
// Obtain a packet from the queue.
uint64_t packet_id = hsa_queue_add_write_index_relaxed(queue, 1);
More information about the libc-commits
mailing list