[all-commits] [llvm/llvm-project] 68d133: [OpenMP] Simplify GPU memory globalization

Tue Jun 22 07:56:35 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 68d133a3e8c961965eff9a66b36742a7fcb30165
      https://github.com/llvm/llvm-project/commit/68d133a3e8c961965eff9a66b36742a7fcb30165
  Author: Joseph Huber <jhuber6 at vols.utk.edu>
  Date:   2021-06-22 (Tue, 22 Jun 2021)

  Changed paths:
    M clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
    M clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
    M clang/test/OpenMP/assumes_include_nvptx.cpp
    M clang/test/OpenMP/declare_target_codegen_globalization.cpp
    M clang/test/OpenMP/nvptx_data_sharing.cpp
    M clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
    M clang/test/OpenMP/nvptx_lambda_capturing.cpp
    M clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
    M clang/test/OpenMP/nvptx_nested_parallel_codegen.cpp
    M clang/test/OpenMP/nvptx_parallel_codegen.cpp
    M clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
    M clang/test/OpenMP/nvptx_target_codegen.cpp
    M clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
    M clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
    M clang/test/OpenMP/nvptx_target_parallel_proc_bind_codegen.cpp
    M clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp
    M clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp
    M clang/test/OpenMP/nvptx_target_teams_codegen.cpp
    M clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
    M clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
    M clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
    M clang/test/OpenMP/nvptx_teams_codegen.cpp
    M clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
    M clang/test/OpenMP/target_parallel_debug_codegen.cpp
    M llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
    M llvm/lib/Transforms/IPO/OpenMPOpt.cpp
    M llvm/test/Transforms/OpenMP/globalization_remarks.ll
    M llvm/test/Transforms/PhaseOrdering/openmp-opt-module.ll

  Log Message:
  -----------
  [OpenMP] Simplify GPU memory globalization

Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680