[libc-commits] [PATCH] D149527: [libc] Support global constructors and destructors on NVPTX

Mon May 1 11:23:11 PDT 2023

jhuber6 added inline comments.

================
Comment at: libc/startup/gpu/nvptx/start.cpp:23
+extern "C" {
+// Nvidia's 'nvlink' linker does not provide these sections. We instead need
+// to manually create them and update the globals in the loader implememtation.
----------------
tra wrote:
> The comment is somewhat puzzling.
> 
> The sections themselves would be created by whatever generates the object files, before the linker gets involved.
> IIRC from our exchange on discourse, the actual problem was that nvlink discards the sections it's not familiar with and that's why we can't just put the initializers into a known init/fini sections and have to rely on putting initializers among regular data and use explicit symbols to find them.
> 
> 
Sorry, I meant to say "symbols" here. Normally when the linker finds a `.fini_array` or `.init_array` section it will provide these symbol names to let you traverse the section. This i the behavior in `ld.lld` which is what AMD uses. Nvidia both does not provide a way to put these in the `.init_array` section, nor does the linker create these symbols if you were to force them to exist. The latter could be potentially solved by reinventing `nvlink` in `lld` the former is a more difficult problem. Maybe there's a way to hack around this in the PTX Compiler API.

================
Comment at: libc/test/IntegrationTest/test.cpp:75
+#if defined(__NVPTX__)
+// FIXME: For unknown reason the Nvidia toolchain defines this to be a single
+// byte and will error on a mismatched size otherwise.
----------------
tra wrote:
> What exactly is the 'toolchain' in this context?
I mean this as when going through a compilation targeting `nvptx64` e.g. `clang++ test.cpp --target=nvptx64-nvidia-cuda -march=sm_70 -c`

================
Comment at: libc/test/IntegrationTest/test.cpp:80
 // Integration tests are linked with -nostdlib. BFD linker expects
 // __dso_handle when -nostdlib is used.
 void *__dso_handle = nullptr;
----------------
tra wrote:
> What exactly needs this symbol? 
> 
> I'm surprised we need to care about DSOs on NVPTX as we do not have any there.
> 
> Googling around (https://stackoverflow.com/questions/34308720/where-is-dso-handle-defined) suggests that we may avoid the issue by compiling with `-fno-use-cxa-atexit`.
> 
> @MaskRay -- any suggestions on what's the right way to deal with this?
That might work for these hermetic tests since we should provide the base `atexit`. @sivachandra do you think we could use this? It seems to be supported on both Clang and GCC.

================
Comment at: libc/utils/gpu/loader/nvptx/CMakeLists.txt:12-13
   CUDA::cuda_driver
+  LLVMObject
+  LLVMSupport
 )
----------------
tra wrote:
> Can you check how long the clean build of the tool with `-j 6` would take now? If it's in the ballpark of a minute or so, we can probably live with that. Otherwise we should build the tool along with clang/LLVM, similar to how we deal with `libc-hdrgen`.
Is this assume we need to rebuild the libraries? I figured these would be copied somewhere in the build environment. On my machine rebuilding the loader takes about five seconds.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149527/new/

https://reviews.llvm.org/D149527