[Openmp-commits] [PATCH] D75546: [libomptarget] Implement locks for amdgcn

Jon Chesterfield via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Tue Mar 3 10:56:32 PST 2020

JonChesterfield created this revision.
JonChesterfield added reviewers: jdoerfert, ABataev, grokos.
Herald added subscribers: openmp-commits, jfb, mgorny, jvesely.
Herald added a project: OpenMP.
JonChesterfield updated this revision to Diff 247964.
JonChesterfield added a comment.

- s/warp/wavefront

[libomptarget] Implement locks for amdgcn

The nvptx implementation deadlocks on amdgcn. atomic_cas with multiple
active lanes can deadlock - if one lane succeeds, all the others are locked
out. The set_lock implementation therefore runs on a single lane.

Also uses a sleep intrinsic instead of the system clock for a probably
minor performance improvement. The unset/test implementations may be revised
later, based on code size / performance or similar concerns.

This implements the lock at a per-wavefront scope. That's not strictly as
specified, since openmp describes locks in terms of threads. I think the
nvptx implementation provides true per-thread locking on volta and the same
per-warp locking on other architectures.

  rG LLVM Github Monorepo



Index: openmp/libomptarget/deviceRTLs/amdgcn/src/amdgcn_locks.hip
--- /dev/null
+++ openmp/libomptarget/deviceRTLs/amdgcn/src/amdgcn_locks.hip
@@ -0,0 +1,51 @@
+//===-- amdgcn_locks.hip - AMDGCN OpenMP GPU lock implementation -- HIP -*-===//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+// Definitions of openmp lock functions
+// A 'thread' maps onto a lane of the wavefront. This means a per-thread lock
+// cannot be implemented - if one thread gets the lock, it can't continue on to
+// the next instruction in order to do anything as the other threads are waiting
+// to take the lock
+// The closest approximatation we can implement is to lock per-wavefront.
+#include "common/support.h"
+#include "common/target_atomic.h"
+#include "target_impl.h"
+#define UNSET 0u
+#define SET 1u
+DEVICE void __kmpc_impl_init_lock(omp_lock_t *lock) {
+  __kmpc_impl_unset_lock(lock);
+DEVICE void __kmpc_impl_destroy_lock(omp_lock_t *lock) {
+  __kmpc_impl_unset_lock(lock);
+DEVICE void __kmpc_impl_set_lock(omp_lock_t *lock) {
+  uint64_t lowestActiveThread = __kmpc_impl_ffs(__kmpc_impl_activemask()) - 1;
+  if (GetLaneId() == lowestActiveThread) {
+    while (__kmpc_atomic_cas(lock, UNSET, SET) != UNSET) {
+      __builtin_amdgcn_s_sleep(0);
+    }
+  }
+  // test_lock will now return true for any thread in the wavefront
+DEVICE void __kmpc_impl_unset_lock(omp_lock_t *lock) {
+  // Could be an atomic store of UNSET
+  (void)__kmpc_atomic_exchange(lock, UNSET);
+DEVICE int __kmpc_impl_test_lock(omp_lock_t *lock) {
+  // Could be an atomic load
+  return __kmpc_atomic_add(lock, 0u);
Index: openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt
--- openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt
+++ openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt
@@ -56,6 +56,7 @@
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/amdgcn_locks.hip

