[Openmp-commits] [PATCH] D93135: [libomptarget][devicertl] Port amdgcn devicertl to openmp

Mon Jan 4 02:26:16 PST 2021

JonChesterfield added a comment.

Just re-read this and still agree with the direction, but have seen a couple of things to double check. I don't mind the SHARED foo -> SHARED(foo) syntax change or pragma noise. The 'todo' comments about what look like compiler bugs are a concern, but they won't be the only compiler bugs that need to be fixed for amdgpu, and this runtime may be the most complicated openmp code I've tried to compile.

I will check that this is indeed a non-functional change for nvptx (I'll compile the deviceRTL to bitcode with+without it and diff the IR and iterate until they're an exact match, as that's more robust than tests I can run locally). Changing cuda to openmp doesn't generate the same IR at present so I'd like to be cautious about the change over. Likewise changing rocm -x hip to rocm -x openmp doesn't generate the same code for amdgpu, but in that case I know the different IR does still work.

I'd like to convert amdgpu to openmp as an intermediate step towards doing the same for nvptx. The above work around notes suggest the rocm compiler is doing some strange things for openmp codegen. I'd like to ignore that in favour of getting the trunk compiler to generate the right code for openmp, at which point rocm will pick up those fixes and I won't have to write the patches twice. Then, with trunk generating the right code for amdgpu, it should be straightforward to switch nvptx over.

I can also prepare a prettier patch that doesn't include those workarounds (e.g. the omp_is_initial_device bodge), which will make the code in trunk look better, with the note that it'll waste a bunch of my time keeping the internal builds working.

================
Comment at: openmp/libomptarget/deviceRTLs/common/omptarget.h:302
+extern DEVICE uint8_t parallelLevel[MAX_THREADS_PER_TEAM / WARPSIZE];
+#pragma omp allocate(parallelLevel) allocator(omp_pteam_mem_alloc)
+#else
----------------
Note to self: Does the allocate clause need to go on the declaration? According to spec and/or implementation? Ideally the header would only say 'there is a variable somewhere' and the implementation would specify that it's from a particular allocator, but that means IR gen will use a generic address space which may not work out

================
Comment at: openmp/libomptarget/deviceRTLs/common/src/libcall.cu:323
+// Working around here until the compiler quirk is understood
+DEVICE int omp_is_initial_device_OVERLOAD(void) asm("omp_is_initial_device");
+DEVICE int omp_is_initial_device_OVERLOAD(void) {
----------------
Note to self -  don't use this bodge for nvptx

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93135/new/

https://reviews.llvm.org/D93135