[Openmp-commits] [PATCH] D110193: [RFC] Initial documentation for declare target indirect support.

Vyacheslav Zakharin via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Tue Sep 21 12:25:42 PDT 2021

vzakhari created this revision.
vzakhari added reviewers: RaviNarayanaswamy, grokos, erichkeane.
vzakhari requested review of this revision.
Herald added a reviewer: jdoerfert.
Herald added subscribers: openmp-commits, sstefan1.
Herald added a project: OpenMP.

Please review proposed implementation of the support for `declare target indirect`.

  rG LLVM Github Monorepo



Index: openmp/libomptarget/docs/declare_target_indirect.md
--- /dev/null
+++ openmp/libomptarget/docs/declare_target_indirect.md
@@ -0,0 +1,51 @@
+# Overview
+The indirect clause enables **indirect device invocation** for a procedure:
+> 19 An indirect call to the device version of a procedure on a device other than the host<br>
+> 20 device, through a function pointer (C/C++), a pointer to a member function (C++) or<br>
+> 21 a procedure pointer (Fortran) that refers to the host version of the procedure.
+# Compiler/runtime support
+### Offload entries table
+The offload entries table that is created for the host and for each of the device images currently have entries for **declare target** global variables, **omp target** outlined functions and constructor/destructor thunks for **declare target** global variables.
+Compiler will also produce an entry for each procedure listed in **indirect** clause of **declare target** construct:
+struct __tgt_offload_entry {
+  void *addr;       // Pointer to the function
+  char *name;       // Name of the function
+  size_t size;      // 0 for function
+  int32_t flags;    // OpenMPOffloadingDeclareTargetFlags::OMP_DECLARE_TARGET_FPTR
+  int32_t reserved; // Reserved
+### Run-time dispatch in device code
+When an indirect function call is generated by a FE in **device code** it defines the following global variable for the translation module:
+__attribute__((weak)) struct __openmp_offload_function_ptr_map_ty {
+  int64_t host_ptr; // key
+  int64_t tgt_ptr;  // value
+} *__openmp_offload_function_ptr_map = 0;
+FE generates runtime lookup code to match the function address against the key `host_ptr` and produce the new function address `tgt_ptr` that is then used for the indirect function call.
+#### Optimization for non-unified_shared_memory
+Since all pointers are supposed to be translated/mapped, when program does not use **required unified_shared_memory**, it is possible to avoid generating the runtime dispatch code for indirect function calls. The mapping between host and device address of an indirect function will be established by `libomptarget` during processing of the offload entries table.
+## Runtime handling of function pointers
+`OpenMPOffloadingDeclareTargetFlags::OMP_DECLARE_TARGET_FPTR` is a new flag to distinguish offload entries for function pointers from other function entries.  Unlike other function entries (with `size` equal to 0) `omptarget::InitLibrary()` will establish mapping for function pointer entries in `Device.HostDataToTargetMap`.
+Once `Device.HostDataToTargetMap` is populated, `libomptarget` walks the host offload entries table and creates an entry in the host version of `__openmp_offload_function_ptr_map` for each `OMP_DECLARE_TARGET_PTR` entry - the device pointer is taken from `Device.HostDataToTargetMap`.  `libomptarget` sorts the host version of `__openmp_offload_function_ptr_map` by `key` values and then transfers the table to the device memory (implying device memory allocation via `omp_target_alloc`).
+The device address of the transferred data is then assigned into `__openmp_offload_function_ptr_map` on the device.  The assignment may be made in different ways, so it is the plugin responsibility to do the assignment in a target dependent way.  Plugins provide an optional interface that implement the assignment, e.g. `__tgt_rtl_set_function_ptr_map(int32_t device_id, void *device_addr)`, where `device_addr` has the same characteristics as an address returned by `omp_target_alloc` invoked for the same device identified by `device_id`.
+#### Optimization for non-unified_shared_memory
+For programs that do not use **required unified_shared_memory** only `Device.HostDataToTargetMap` mapping is necessary.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D110193.374008.patch
Type: text/x-patch
Size: 3879 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20210921/023aa255/attachment.bin>

More information about the Openmp-commits mailing list