[all-commits] [llvm/llvm-project] 10068c: [OpenMP] Introduce kernel environment

Shilei Tian via All-commits all-commits at lists.llvm.org
Wed Jul 26 10:35:29 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 10068cd65440bf8a8c1eaad1ce6e945537441925
      https://github.com/llvm/llvm-project/commit/10068cd65440bf8a8c1eaad1ce6e945537441925
  Author: Shilei Tian <i at tianshilei.me>
  Date:   2023-07-26 (Wed, 26 Jul 2023)

  Changed paths:
    M clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
    M clang/test/OpenMP/amdgcn_target_codegen.cpp
    M clang/test/OpenMP/amdgcn_target_device_vla.cpp
    M clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c
    M clang/test/OpenMP/declare_target_codegen_globalization.cpp
    M clang/test/OpenMP/nvptx_SPMD_codegen.cpp
    M clang/test/OpenMP/nvptx_data_sharing.cpp
    M clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
    M clang/test/OpenMP/nvptx_lambda_capturing.cpp
    M clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
    M clang/test/OpenMP/nvptx_nested_parallel_codegen.cpp
    M clang/test/OpenMP/nvptx_parallel_codegen.cpp
    M clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
    M clang/test/OpenMP/nvptx_target_codegen.cpp
    M clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
    M clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
    M clang/test/OpenMP/nvptx_target_parallel_proc_bind_codegen.cpp
    M clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp
    M clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp
    M clang/test/OpenMP/nvptx_target_printf_codegen.c
    M clang/test/OpenMP/nvptx_target_simd_codegen.cpp
    M clang/test/OpenMP/nvptx_target_teams_codegen.cpp
    M clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
    M clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
    M clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
    M clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
    M clang/test/OpenMP/nvptx_target_teams_distribute_simd_codegen.cpp
    M clang/test/OpenMP/nvptx_target_teams_generic_loop_codegen.cpp
    M clang/test/OpenMP/nvptx_target_teams_generic_loop_generic_mode_codegen.cpp
    M clang/test/OpenMP/nvptx_teams_codegen.cpp
    M clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
    M clang/test/OpenMP/reduction_implicit_map.cpp
    M clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
    M clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
    M clang/test/OpenMP/target_parallel_debug_codegen.cpp
    M clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
    M clang/test/OpenMP/target_parallel_generic_loop_codegen-3.cpp
    M clang/test/OpenMP/target_parallel_generic_loop_codegen.cpp
    M clang/test/OpenMP/target_teams_generic_loop_codegen.cpp
    M llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
    M llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
    M llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
    M llvm/lib/Transforms/IPO/OpenMPOpt.cpp
    M llvm/test/Transforms/Attributor/reduced/aa_execution_domain_wrong_fn.ll
    M llvm/test/Transforms/Attributor/value-simplify-local-remote.ll
    M llvm/test/Transforms/OpenMP/add_attributes.ll
    M llvm/test/Transforms/OpenMP/always_inline_device.ll
    M llvm/test/Transforms/OpenMP/custom_state_machines.ll
    M llvm/test/Transforms/OpenMP/custom_state_machines_pre_lto.ll
    M llvm/test/Transforms/OpenMP/custom_state_machines_remarks.ll
    M llvm/test/Transforms/OpenMP/deduplication_target.ll
    M llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
    M llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold_optnone.ll
    M llvm/test/Transforms/OpenMP/global_constructor.ll
    M llvm/test/Transforms/OpenMP/globalization_remarks.ll
    M llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
    M llvm/test/Transforms/OpenMP/is_spmd_exec_mode_fold.ll
    M llvm/test/Transforms/OpenMP/nested_parallelism.ll
    M llvm/test/Transforms/OpenMP/parallel_level_fold.ll
    M llvm/test/Transforms/OpenMP/remove_globalization.ll
    M llvm/test/Transforms/OpenMP/replace_globalization.ll
    M llvm/test/Transforms/OpenMP/single_threaded_execution.ll
    M llvm/test/Transforms/OpenMP/spmdization.ll
    M llvm/test/Transforms/OpenMP/spmdization_assumes.ll
    M llvm/test/Transforms/OpenMP/spmdization_constant_prop.ll
    M llvm/test/Transforms/OpenMP/spmdization_guarding.ll
    M llvm/test/Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
    M llvm/test/Transforms/OpenMP/spmdization_no_guarding_two_reaching_kernels.ll
    M llvm/test/Transforms/OpenMP/spmdization_remarks.ll
    M llvm/test/Transforms/OpenMP/value-simplify-openmp-opt.ll
    M llvm/test/Transforms/PhaseOrdering/openmp-opt-module.ll
    M llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
    M mlir/test/Target/LLVMIR/omptarget-region-device-llvm.mlir
    M openmp/libomptarget/DeviceRTL/CMakeLists.txt
    M openmp/libomptarget/DeviceRTL/include/Debug.h
    M openmp/libomptarget/DeviceRTL/include/Interface.h
    M openmp/libomptarget/DeviceRTL/include/State.h
    M openmp/libomptarget/DeviceRTL/src/Configuration.cpp
    M openmp/libomptarget/DeviceRTL/src/Debug.cpp
    M openmp/libomptarget/DeviceRTL/src/Kernel.cpp
    M openmp/libomptarget/DeviceRTL/src/State.cpp
    R openmp/libomptarget/include/DeviceEnvironment.h
    A openmp/libomptarget/include/Environment.h
    M openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
    M openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
    M openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
    M openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
    M openmp/libomptarget/plugins-nextgen/generic-elf-64bit/src/rtl.cpp

  Log Message:
  -----------
  [OpenMP] Introduce kernel environment

This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569




More information about the All-commits mailing list