[llvm] [NVPTX] support packed f32 instructions for sm_100+ (PR #126337)

Nikolay Panchenko via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 18 11:12:58 PDT 2025


npanchen wrote:

smallest reproducer:
```
define ptx_kernel void @test(float %0) {
  %2 = insertelement <2 x float> zeroinitializer, float %0, i64 0
  %3 = fptrunc <2 x float> %2 to <2 x bfloat>
  store <2 x bfloat> %3, ptr addrspace(1) null, align 2
  ret void
}
```

without the change isel generates:
```
bb.0 (%ir-block.1):
  %0:b32 = LD_i32 0, 0, 101, 3, 32, &test_param_0, 0 :: (dereferenceable invariant load (s32), addrspace 101)
  %1:b16 = CVT_bf16_f32 killed %0:b32, 5
  %3:b16 = IMPLICIT_DEF
  %2:b32 = V2I16toI32 killed %1:b16, killed %3:b16
  %4:b64 = IMOV64i 0
  ST_i32 killed %2:b32, 0, 0, 1, 16, killed %4:b64, 0 :: (store (s16) into `ptr addrspace(1) null`, addrspace 1)
  %5:b64 = IMOV64i 2
  ST_i32 0, 0, 0, 1, 16, killed %5:b64, 0 :: (store (s16) into `ptr addrspace(1) null` + 2, addrspace 1)
  Return
```

with the change:
```
bb.0 (%ir-block.1):
  %0:b32 = LD_i32 0, 0, 101, 3, 32, &test_param_0, 0 :: (dereferenceable invariant load (s32), addrspace 101)
  %1:b32 = FMOV32i float 0.000000e+00
  %2:b64 = V2I32toI64 killed %0:b32, killed %1:b32
  %3:b64 = IMOV64i 0
  ST_i64 %2:b64, 0, 0, 1, 16, killed %3:b64, 0 :: (store (s16) into `ptr addrspace(1) null`, addrspace 1)
  %4:b64 = SRLi64ri %2:b64, 16
  %5:b64 = IMOV64i 2
  ST_i64 killed %4:b64, 0, 0, 1, 16, killed %5:b64, 0 :: (store (s16) into `ptr addrspace(1) null` + 2, addrspace 1)
  %6:b64 = IMOV64i 6
  ST_i64 0, 0, 0, 1, 16, killed %6:b64, 0 :: (store (s16) into `ptr addrspace(1) null` + 6, addrspace 1)
  %7:b64 = IMOV64i 4
  ST_i64 0, 0, 0, 1, 16, killed %7:b64, 0 :: (store (s16) into `ptr addrspace(1) null` + 4, addrspace 1)
  Return
```

cc @Artem-B 

https://github.com/llvm/llvm-project/pull/126337


More information about the llvm-commits mailing list