[Openmp-commits] [openmp] 221ada6 - [libomptarget] Implement locks for amdgcn

Thu Mar 5 12:25:55 PST 2020

Author: Jon Chesterfield
Date: 2020-03-05T20:25:31Z
New Revision: 221ada654b28a524d01dc70ec16d38e0f2484f78

URL: https://github.com/llvm/llvm-project/commit/221ada654b28a524d01dc70ec16d38e0f2484f78
DIFF: https://github.com/llvm/llvm-project/commit/221ada654b28a524d01dc70ec16d38e0f2484f78.diff

LOG: [libomptarget] Implement locks for amdgcn

Summary:
[libomptarget] Implement locks for amdgcn

The nvptx implementation deadlocks on amdgcn. atomic_cas with multiple
active lanes can deadlock - if one lane succeeds, all the others are locked
out. The set_lock implementation therefore runs on a single lane.

Also uses a sleep intrinsic instead of the system clock for a probably
minor performance improvement. The unset/test implementations may be revised
later, based on code size / performance or similar concerns.

This implements the lock at a per-wavefront scope. That's not strictly as
specified, since openmp describes locks in terms of threads. I think the
nvptx implementation provides true per-thread locking on volta and the same
per-warp locking on other architectures.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D75546

Added: 
    openmp/libomptarget/deviceRTLs/amdgcn/src/amdgcn_locks.hip

Modified: 
    openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt

Removed: 
    


################################################################################
diff  --git a/openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt b/openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt
index ec08bd912663..1a24bfd6f887 100644

--- a/openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt
+++ b/openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt
@@ -56,6 +56,7 @@ get_filename_component(devicertl_base_directory
 
 set(cuda_sources
   ${CMAKE_CURRENT_SOURCE_DIR}/src/amdgcn_smid.hip
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/amdgcn_locks.hip
   ${CMAKE_CURRENT_SOURCE_DIR}/src/target_impl.hip
   ${devicertl_base_directory}/common/src/cancel.cu
   ${devicertl_base_directory}/common/src/critical.cu

diff  --git a/openmp/libomptarget/deviceRTLs/amdgcn/src/amdgcn_locks.hip b/openmp/libomptarget/deviceRTLs/amdgcn/src/amdgcn_locks.hip
new file mode 100644
index 000000000000..4163a14f50bf
--- /dev/null
+++ b/openmp/libomptarget/deviceRTLs/amdgcn/src/amdgcn_locks.hip
@@ -0,0 +1,28 @@
+//===-- amdgcn_locks.hip - AMDGCN OpenMP GPU lock implementation -- HIP -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// A 'thread' maps onto a lane of the wavefront. This means a per-thread lock
+// cannot be implemented - if one thread gets the lock, it can't continue on to
+// the next instruction in order to do anything as the other threads are waiting
+// to take the lock.
+// These functions will be implemented to provide the documented semantics for
+// a SIMD => wavefront mapping once that is implemented.
+//
+//===----------------------------------------------------------------------===//
+
+#include "common/debug.h"
+
+static DEVICE void warn() {
+  PRINT0(LD_ALL, "Locks are not supported in this thread mapping model");
+}
+
+DEVICE void __kmpc_impl_init_lock(omp_lock_t *) { warn(); }
+DEVICE void __kmpc_impl_destroy_lock(omp_lock_t *) { warn(); }
+DEVICE void __kmpc_impl_set_lock(omp_lock_t *) { warn(); }
+DEVICE void __kmpc_impl_unset_lock(omp_lock_t *) { warn(); }
+DEVICE int __kmpc_impl_test_lock(omp_lock_t *lock) { warn(); }