[llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)
Alex MacLean via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 4 14:19:03 PDT 2025
AlexMaclean wrote:
> @AlexMaclean do you think we could reuse intrinsic autoupgrade machinery for this, instead of making reflect processing more complicated?
>
> It could be somewhat useful for other purposes. E.g. we could introduce a const-foldable (when we know the GPU we're targeting) `nvvm.arch()` which would return **CUDA_ARCH** value and upgrade nvvm_reflect to it. Bonus point is that it would also be useful for IR users to parametrize their code without relying on NVVMReflect.
>
> @jhuber6 would something like that help with some of your offloading cases. I recall you did run into trouble with NVVMReflect a while back.
It's an interesting idea, and I agree that intrinsics seem marginally cleaner than relying on string arguments, but I don't think there will be much benefit in the end. For correctness, we'll still need something like NVVMReflect to run to fold these calls away and do the constant prop + DCE.
I think with some cleanup this new approach will not be more complicated. In some ways I think this allows for simplification because we shouldn't need to be as careful about deleting and simplifying now that we're not iterating of instructions in a function.
https://github.com/llvm/llvm-project/pull/134416
More information about the llvm-commits
mailing list