[llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

Alex MacLean via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 4 14:19:03 PDT 2025


AlexMaclean wrote:

> @AlexMaclean do you think we could reuse intrinsic autoupgrade machinery for this, instead of making reflect processing more complicated?
> 
> It could be somewhat useful for other purposes. E.g. we could introduce a const-foldable (when we know the GPU we're targeting) `nvvm.arch()` which would return **CUDA_ARCH** value and upgrade nvvm_reflect to it. Bonus point is that it would also be useful for IR users to parametrize their code without relying on NVVMReflect.
> 
> @jhuber6 would something like that help with some of your offloading cases. I recall you did run into trouble with NVVMReflect a while back.

It's an interesting idea, and I agree that intrinsics seem marginally cleaner than relying on string arguments, but I don't think there will be much benefit in the end. For correctness, we'll still need something like NVVMReflect to run to fold these calls away and do the constant prop + DCE. 

I think with some cleanup this new approach will not be more complicated. In some ways I think this allows for simplification because we shouldn't need to be as careful about deleting and simplifying now that we're not iterating of instructions in a function.

https://github.com/llvm/llvm-project/pull/134416


More information about the llvm-commits mailing list