[PATCH] D149451: [NVPTX] Add NVPTXCtorDtorLoweringPass to handle global ctors / dtors

Fri Apr 28 11:19:05 PDT 2023

jhuber6 added a comment.

In D149451#4306160 <https://reviews.llvm.org/D149451#4306160>, @tra wrote:

> The standard CUDA runtime has no mechanisms for dealing with dynamic initializers, so enabling this pass in NVPTX back-end by default is not the right thing to do.
>
> We may need to keep this pass behind some sort of flag, or add the it for stand-alone compilation mode only.

Yeah that's what I was thinking. We could make `clang` emit that flag by default if there's no host ToolChain. We could attempt to make this more of a runtime specific thing. E.g. we could emit a kernel and then have some code that tries to register it in CUDA CodeGen. Probably not worth it though. This is primarily just my desire to have my weird standalone implementation functional.

>> An alternative is to also do what AMDGPU does and emit a kernel that can be called. I eschewed that mainly because we would get naming conflicts,
>
> CUDA front-end uses `--cuid` for somewhat similar purpose of disambiguating TU-specific objects. We could pass it as an option to the pass to give each TU a unique ID and avoid naming conflicts.

It would be simpler in this case since we'd be looking things up at a prefix if we wished to fish them out. I'm sure we could use some psuedo-hash to make this work fine.

I'm still very unhappy that you can't emit sections in `ptxas`. maybe we need to make a `ptxas` wrapper that compiles it in debug mode and regular mode then objcopies the section from the debug mode one into the regular one. I'm sure nothing could go wrong there :).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149451/new/

https://reviews.llvm.org/D149451