[clang] [llvm] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)
Fabian Mora via llvm-commits
llvm-commits at lists.llvm.org
Sun Jan 14 16:18:07 PST 2024
================
@@ -568,32 +590,45 @@ void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
} // namespace
-Error wrapOpenMPBinaries(Module &M, ArrayRef<ArrayRef<char>> Images) {
- GlobalVariable *Desc = createBinDesc(M, Images);
+Error OffloadWrapper::wrapOpenMPBinaries(
+ Module &M, ArrayRef<ArrayRef<char>> Images,
+ std::optional<EntryArrayTy> EntryArray) const {
+ GlobalVariable *Desc = createBinDesc(
+ M, Images,
+ EntryArray
+ ? *EntryArray
+ : offloading::getOffloadEntryArray(M, "omp_offloading_entries"),
----------------
fabianmcg wrote:
I see what you mean, first some broader context, this patch is also part of a patch series that will add GPU compilation for OMP operations in MLIR without the need for `flang` or `clang`, which is not currently possible. This series also enables to JIT OMP operations in MLIR. The goal of the series is to make OMP target functional in MLIR as a standalone.
I allow the passage of a custom entry array because ORC JIT doesn't fully support `__start`, `__stop` symbols for grouping section data. My solution was allowing the custom entry array, so in MLIR I build the full entry array and never rely on sections, this applies to OMP, CUDA and HIP.
Thus we have that the following MLIR:
```
module attributes {gpu.container_module} {
gpu.binary @binary <#gpu.offload_embedding<cuda>> [#gpu.object<#nvvm.target, bin = "BLOB">]
llvm.func @func() {
%1 = llvm.mlir.constant(1 : index) : i64
gpu.launch_func @binary::@hello blocks in (%1, %1, %1) threads in (%1, %1, %1) : i64
gpu.launch_func @binary::@world blocks in (%1, %1, %1) threads in (%1, %1, %1) : i64
llvm.return
}
}
```
Produces:
```
@__begin_offload_binary = internal constant [2 x %struct.__tgt_offload_entry] [%struct.__tgt_offload_entry { ptr @binary_Khello, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, %struct.__tgt_offload_entry { ptr @binary_Kworld, ptr @.omp_offloading.entry_name.2, i64 0, i32 0, i32 0 }]
@__end_offload_binary = internal constant ptr getelementptr inbounds (%struct.__tgt_offload_entry, ptr @__begin_offload_binary, i64 2)
@.fatbin_image.binary = internal constant [4 x i8] c"BLOB", section ".nv_fatbin"
@.fatbin_wrapper.binary = internal constant %fatbin_wrapper { i32 1180844977, i32 1, ptr @.fatbin_image.binary, ptr null }, section ".nvFatBinSegment", align 8
@.cuda.binary_handle.binary = internal global ptr null
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg.binary, ptr null }]
@binary_Khello = weak constant i8 0
@.omp_offloading.entry_name = internal unnamed_addr constant [6 x i8] c"hello\00"
@binary_Kworld = weak constant i8 0
@.omp_offloading.entry_name.2 = internal unnamed_addr constant [6 x i8] c"world\00"
...
```
And this works.
https://github.com/llvm/llvm-project/pull/78057
More information about the llvm-commits
mailing list