[clang] [Cuda] Handle -fcuda-short-ptr even with -nocudalib (PR #111682)

Wed Oct 9 07:22:42 PDT 2024

frasercrmck wrote:

> > > Seems reasonable, which architectures require this? I know that NVIDIA deprecated the 32-bit `nvptx` target in CUDA 12 or something.
> > 
> > 
> > I'm not an expert on CUDA but, AFAICT, even in 64-bit CUDA, certain pointers such as those pointing to shared memory are 32 bit, because the size of shared memory is somewhere in the kB range. This generates better code, fewer registers, etc. I'm not sure why the option isn't enabled by default, personally - it seems like `nvcc` is doing this by default.
> > I was just playing with the option downstream and noticed this issue.
> 
> I figured it was something like that, since it saves a register per address. I don't know the history for why this isn't the default, it's pretty much just a data layout modifier to state that certain address spaces are 32-bit. Maybe @Artem-B or @jlebar can comment.

Just threw together a nonsensical example for godbolt: https://godbolt.org/z/bhdEhrxd7. Notice the `mov.u32 %r7, As`, etc.

https://github.com/llvm/llvm-project/pull/111682