[all-commits] [llvm/llvm-project] 244e98: [Libomptarget] Improve device runtime implementati...

Joseph Huber via All-commits all-commits at lists.llvm.org
Tue Jun 22 08:53:28 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 244e98ff480859e969a99d07064745b075f3a892
      https://github.com/llvm/llvm-project/commit/244e98ff480859e969a99d07064745b075f3a892
  Author: Joseph Huber <jhuber6 at vols.utk.edu>
  Date:   2021-06-22 (Tue, 22 Jun 2021)

  Changed paths:
    M openmp/libomptarget/deviceRTLs/common/omptarget.h
    M openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
    M openmp/libomptarget/deviceRTLs/common/src/omp_data.cu
    M openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
    M openmp/libomptarget/deviceRTLs/interface.h

  Log Message:
  -----------
  [Libomptarget] Improve device runtime implementation for globalized variables.

Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread.

Depends on D97680

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104666




More information about the All-commits mailing list