[libc-commits] [libc] 06ac828 - [libc] Fix flipped AMDGPU kernel launch arguments (#83648)
via libc-commits
libc-commits at lists.llvm.org
Fri Mar 1 18:56:11 PST 2024
Author: Joseph Huber
Date: 2024-03-01T20:56:07-06:00
New Revision: 06ac828dc1076413b3c2649e9c1d3de33467d308
URL: https://github.com/llvm/llvm-project/commit/06ac828dc1076413b3c2649e9c1d3de33467d308
DIFF: https://github.com/llvm/llvm-project/commit/06ac828dc1076413b3c2649e9c1d3de33467d308.diff
LOG: [libc] Fix flipped AMDGPU kernel launch arguments (#83648)
Summary:
These values were incorrectly flipped, setting the size of the blocks to
the threads and vice-versa. When I originally wrote the thread utilities
it was using COV4 which used an implicit format. Then when I updated I
accidentally flipped them and never noticed because nothing depended on
the size of the threads until I checked it manually.
Added:
Modified:
libc/utils/gpu/loader/amdgpu/Loader.cpp
Removed:
################################################################################
diff --git a/libc/utils/gpu/loader/amdgpu/Loader.cpp b/libc/utils/gpu/loader/amdgpu/Loader.cpp
index 0ff2dce813ed2b..e3911eda2bd82a 100644
--- a/libc/utils/gpu/loader/amdgpu/Loader.cpp
+++ b/libc/utils/gpu/loader/amdgpu/Loader.cpp
@@ -230,12 +230,12 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
implicit_args_t *implicit_args = reinterpret_cast<implicit_args_t *>(
reinterpret_cast<uint8_t *>(args) + sizeof(args_t));
implicit_args->grid_dims = dims;
- implicit_args->grid_size_x = params.num_threads_x;
- implicit_args->grid_size_y = params.num_threads_y;
- implicit_args->grid_size_z = params.num_threads_z;
- implicit_args->workgroup_size_x = params.num_blocks_x;
- implicit_args->workgroup_size_y = params.num_blocks_y;
- implicit_args->workgroup_size_z = params.num_blocks_z;
+ implicit_args->grid_size_x = params.num_blocks_x;
+ implicit_args->grid_size_y = params.num_blocks_y;
+ implicit_args->grid_size_z = params.num_blocks_z;
+ implicit_args->workgroup_size_x = params.num_threads_x;
+ implicit_args->workgroup_size_y = params.num_threads_y;
+ implicit_args->workgroup_size_z = params.num_threads_z;
// Obtain a packet from the queue.
uint64_t packet_id = hsa_queue_add_write_index_relaxed(queue, 1);
More information about the libc-commits
mailing list