[llvm] [NVPTX] Optimize v2x16 BUILD_VECTORs to PRMT (PR #116675)

Fraser Cormack via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 19 01:49:45 PST 2024


================
@@ -807,14 +807,11 @@ define <2 x i16> @test_select_cc_i16_i32(<2 x i16> %a, <2 x i16> %b,
 define <2 x i16> @test_trunc_2xi32(<2 x i32> %a) #0 {
 ; COMMON-LABEL: test_trunc_2xi32(
 ; COMMON:       {
-; COMMON-NEXT:    .reg .b16 %rs<3>;
-; COMMON-NEXT:    .reg .b32 %r<4>;
+; COMMON-NEXT:    .reg .b32 %r<5>;
 ; COMMON-EMPTY:
 ; COMMON-NEXT:  // %bb.0:
 ; COMMON-NEXT:    ld.param.v2.u32 {%r1, %r2}, [test_trunc_2xi32_param_0];
-; COMMON-NEXT:    cvt.u16.u32 %rs1, %r2;
-; COMMON-NEXT:    cvt.u16.u32 %rs2, %r1;
-; COMMON-NEXT:    mov.b32 %r3, {%rs2, %rs1};
+; COMMON-NEXT:    prmt.b32 %r3, %r1, %r2, 0x5410U;
----------------
frasercrmck wrote:

On `ptxas v10.2.89` for SM70, no diff - it created a `PRMT` instruction either way.

On `ptxas v12.6.1`, for SM70,

``` diff
@@ -1,19 +1,11 @@
 test_trunc_2xi32:
  IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28]
  @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ
- LDC.U16 R0, c[0x0][0x160]
- MOV R3, c[0x0][0x164]
- LDC.U16 R5, c[0x0][0x164]
- PRMT R0, R0, 0x5410, R5
+ IMAD.MOV.U32 R3, RZ, RZ, c[0x0][0x164]
+ MOV R0, c[0x0][0x160]
+ PRMT R0, R0, 0x5410, R3
  LDC RZ, c[0x0][R3+0x160]
  EXIT
 .L_x_0:
  BRA `(.L_x_0)
- NOP
- NOP
- NOP
- NOP
- NOP
- NOP
- NOP
 .L_x_1:
```

I'm not sure if the fewer NOPs are spurious or not.

On `ptxas v12.6.1`, for SM89,

``` diff
@@ -1,23 +1,15 @@
 test_trunc_2xi32:
  IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28]
- ULDC.U16 UR5, c[0x0][0x164]
- ULDC.U16 UR4, c[0x0][0x160]
- IMAD.U32 R3, RZ, RZ, UR5
- MOV R0, UR4
- ULDC UR4, c[0x0][0x164]
+ IMAD.MOV.U32 R3, RZ, RZ, c[0x0][0x164]
+ MOV R0, c[0x0][0x160]
  PRMT R0, R0, 0x5410, R3
- LDC RZ, cx[UR4][0x160]
+ LDC RZ, c[0x0][R3+0x160]
  EXIT
 .L_x_0:
  BRA `(.L_x_0)
  NOP
  NOP
  NOP
- NOP
- NOP
- NOP
- NOP
- NOP
  NOP
  NOP
  NOP
```

https://github.com/llvm/llvm-project/pull/116675


More information about the llvm-commits mailing list