[llvm] [NVPTX] support packed f32 instructions for sm_100+ (PR #126337)
Princeton Ferro via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 22 11:45:05 PDT 2025
Prince781 wrote:
@npanchen if the original source was CUDA C++, then you can use the [same trick CUTLASS uses](https://github.com/NVIDIA/cutlass/blob/main/include/cute/arch/mma_sm90_gmma.hpp#L86-L95) where you pass the operand through a `asm volatile` to get the desired anti-dependency with your `wgmma.mma_async` inline ASM call.
https://github.com/llvm/llvm-project/pull/126337
More information about the llvm-commits
mailing list