[PATCH] [OPENMP] Driver support for OpenMP offloading

Thu May 21 14:20:19 PDT 2015

Hi John

Thanks for looking into this patch!

Sure, let me expand on the host-target communication. Just a little bit of context before I do that:

During code generation, the target frontend has to decide whether a given declaration or target region has to be emitted or not. Without any information communicated from the host frontend, this decision become complicated for cases like:

- #pragma omp target regions in static functions or class members;
- static declarations delimitted by #pragma omp declare target regions that end up not being used;
- #pragma omp target in template functions

In order for the target frontend to correctly identify all the declarations that need to be emitted it would have to, somehow, emulate the actions done by the host frontend which would turn the code generation messy in places that do not even relate with OpenMP.

On top of that, in order to have an efficient mapping between host and target entries (global declarations/target regions)
 table (this is discussed in the document, in section 5.1, where __tgt_offload_entry is introduced) the compiler would have to emit the corresponding entries in the host and target side in the same order. This is useful for devices whose toolchain maintain the order of the symbols given that the order of the entries in the host and target tables will be the same after linking. So knowing an index would be enough to do the mapping. In order for that to happen, the target frontend would have to know that order, which would be also hard to extract if no information is communicated form the host.

So, the information that needs to be propagated to make what I described above possible is basically i) declaration mangled names and ii) order they were emitted. This information could be communicated in the form of metadata that is emitted by the host frontend when the module is released and loaded by the target frontend when CGOpenMPRuntime is created. This information has however to be coded in slightly different ways for different kinds of declarations. Let me explain this with an example:

//######################################
#pragma omp declare target
struct MyClass{

  ...

  MyClass &add(MyClass &op){...}

  MyClass &add(int (&op)[N]){...}

  bool eq(MyClass &op){...}

  MyClass() {...}

  ~MyClass() {...}

};

MyClass C;
MyClass D;
#pragma omp end declare target

void foo(){

  int AA[N];
  MyClass H, T;
  MyClass HC;

  ...

  #pragma omp target
  {
    MyClass TC;
    T.add(AA).add(HC);
  }

  if (H.eq(T)) {...}

  #pragma omp target
  {
    T.add(AA);
  } 

}
//######################################

I was planning the metadata for this example to look more or less like this:

; Named metadata that encloses all the offloading information
!openmp.offloading.info = !{!1, !2, !3, !4, !5, !6, !7, !8, !9, !10}

; Global variables that require a map between host and target:
; Entry 0 -> ID for this type of metadata (0)
; Entry 1 -> Mangled name of the variable
; Entry 2 -> Order it was emitted
!1 = !{i32 0, !"C", i32 0}
!2 = !{i32 0, !"D", i32 2}

; Functions with target regions
; Entry 0 -> ID for this type of metadata (1)
; Entry 1 -> Mangled name of the function that was emitted for the host and encloses target regions
; Entry 2-n -> Order the target regions in the functions (in the same sequence the statements are found) are emitted 
!3 = !{i32 1, !"_Z3foov", i32 4, i32 5}

; Global initializers
; Entry 0 -> ID for this type of metadata (2)
; Entry 1-n -> Order the initializers are emitted in descending order of priority (we will require a target region per set of initializers with the same priority)
!4 = !{i32 2, i32 6}

; Global Dtors
; Entry 0 -> ID for this type of metadata (3)
; Entry 1 -> Mangled name of the variable to be destructed 
; Entry 2 -> Order the destructor was emitted (we will have a target region per variable being destroyed - this can probably be optimized)
!5 = !{i32 3, !"C", i32 1}
!6 = !{i32 3, !"D", i32 3}

; Other functions that should be emitted in the target but do not require to be mapped to the host
; Entry 0 -> ID for this type of metadata (4)
; Entry 1 -> Mangled name of the function that has to be emitted.
!7 = !{i32 4, !"_ZN7MyClass3addERA64_i"}
!8 = !{i32 4, !"_ZN7MyClass3addERS_"}
!9 = !{i32 4, !"_ZN7MyClassC2Ev"}
!10 = !{i32 4, !"_ZN7MyClassD2Ev"}

I realize this is the kind of information I should propose as a patch to the codegen part of offloading, but I think it makes sense to discuss it now as the driver has to enable it.

I also foresee the communication between target and host to be useful for other cases, like the propagation of alias information from host to target. I don’t have have however a proposal for that at this moment.

Hope I haven’t been either too brief or too exhaustive! Let me know if I can clarify anything else for you.

Thanks!
Samuel

http://reviews.llvm.org/D9888

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/