[clang] [llvm] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)

Sun Jan 14 16:18:07 PST 2024

================
@@ -568,32 +590,45 @@ void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
 
 } // namespace
 
-Error wrapOpenMPBinaries(Module &M, ArrayRef<ArrayRef<char>> Images) {
-  GlobalVariable *Desc = createBinDesc(M, Images);
+Error OffloadWrapper::wrapOpenMPBinaries(
+    Module &M, ArrayRef<ArrayRef<char>> Images,
+    std::optional<EntryArrayTy> EntryArray) const {
+  GlobalVariable *Desc = createBinDesc(
+      M, Images,
+      EntryArray
+          ? *EntryArray
+          : offloading::getOffloadEntryArray(M, "omp_offloading_entries"),
----------------
fabianmcg wrote:

I see what you mean, first some broader context, this patch is also part of a patch series that will add GPU compilation for OMP operations in MLIR without the need for `flang` or `clang`, which is not currently possible. This series also enables to JIT OMP operations in MLIR. The goal of the series is to make OMP target functional in MLIR as a standalone.

I allow the passage of a custom entry array because ORC JIT doesn't fully support `__start`, `__stop` symbols for grouping section data.  My solution was allowing the custom entry array, so in MLIR I build the full entry array and never rely on sections, this applies to OMP, CUDA and HIP.
Thus we have that the following MLIR:
```
module attributes {gpu.container_module} {
  gpu.binary @binary <#gpu.offload_embedding<cuda>> [#gpu.object<#nvvm.target, bin = "BLOB">]
  llvm.func @func() {
    %1 = llvm.mlir.constant(1 : index) : i64
    gpu.launch_func  @binary::@hello blocks in (%1, %1, %1) threads in (%1, %1, %1) : i64
    gpu.launch_func  @binary::@world blocks in (%1, %1, %1) threads in (%1, %1, %1) : i64
    llvm.return
  }
}
```
Produces:
```
@__begin_offload_binary = internal constant [2 x %struct.__tgt_offload_entry] [%struct.__tgt_offload_entry { ptr @binary_Khello, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, %struct.__tgt_offload_entry { ptr @binary_Kworld, ptr @.omp_offloading.entry_name.2, i64 0, i32 0, i32 0 }]
@__end_offload_binary = internal constant ptr getelementptr inbounds (%struct.__tgt_offload_entry, ptr @__begin_offload_binary, i64 2)
@.fatbin_image.binary = internal constant [4 x i8] c"BLOB", section ".nv_fatbin"
@.fatbin_wrapper.binary = internal constant %fatbin_wrapper { i32 1180844977, i32 1, ptr @.fatbin_image.binary, ptr null }, section ".nvFatBinSegment", align 8
@.cuda.binary_handle.binary = internal global ptr null
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg.binary, ptr null }]
@binary_Khello = weak constant i8 0
@.omp_offloading.entry_name = internal unnamed_addr constant [6 x i8] c"hello\00"
@binary_Kworld = weak constant i8 0
@.omp_offloading.entry_name.2 = internal unnamed_addr constant [6 x i8] c"world\00"
...
```
And this works.

https://github.com/llvm/llvm-project/pull/78057