[Openmp-commits] [PATCH] D93135: [libomptarget][devicertl] Port amdgcn devicertl to openmp

Mon Jan 4 10:09:55 PST 2021

JonChesterfield added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/amdgcn/src/amdgcn_intrinsics.h:47
+
+#endif
----------------
jdoerfert wrote:
> This seems unrelated, can it go in before or after?
I think the problem goes:

- openmp compiles target regions for the host
- __builtin_amdgcn_s_sleep can't be called from host code (as it's not a thing)
- therefore compiler doesn't add the symbol to host compilation

This works for device compilation, as the names and types match the functions that actually exist. It 'works' for host compilation, in so far as it emits calls to undefined functions with these names, and the compiler throws away the result of the host compilation.

Cuda has a similar quirk where calling a device intrinsic from a function that is compiled for the host fails. I'm not sure if nvptx/openmp will have the same behaviour.

I'm not sure what the pretty solution to this is. Somewhat inclined to compile target_impl as c++, instead of as openmp, as that bypasses it. I'd quite like to compile the whole library as c++, instead of engaging all the openmp machinery, but clang doesn't handle constructors in address spaces well enough.

================
Comment at: openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h:66
+#define EXTERN_SHARED(NAME) __attribute__((shared)) NAME
+#endif
+
----------------
jdoerfert wrote:
> why the enum? Can we move the _OPENMP stuff in a generic header?
This one was fun. I'd have liked to write '7', but clang sema does some very aggressive checking on the allocator clause that only accepts an enum with the right name and all the right fields filled out.

Yes though, we should have a header in common that exports the SHARED and EXTERN_SHARED macros, which will be helpful for nvptx. Provisionally named it omp_pteam_mem_alloc.

================
Comment at: openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip:74
+
+// static
+DEVICE uint32_t SHARED(L1_Barrier);
----------------
jdoerfert wrote:
> Nit: I'd remove the comment.
Happy to. All of the comment, or just the // static?

================
Comment at: openmp/libomptarget/deviceRTLs/common/src/libcall.cu:323
+// Working around here until the compiler quirk is understood
+DEVICE int omp_is_initial_device_OVERLOAD(void) asm("omp_is_initial_device");
+DEVICE int omp_is_initial_device_OVERLOAD(void) {
----------------
jdoerfert wrote:
> JonChesterfield wrote:
> > Note to self -  don't use this bodge for nvptx
> D38968 is the reason. We should revert that patch as we have context selectors now.
Ah, nice. It's a diy constant folding thing. Agreed that's no longer required.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93135/new/

https://reviews.llvm.org/D93135