[llvm] [NVPTX] Optimize v2x16 BUILD_VECTORs to PRMT (PR #116675)
Fraser Cormack via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 19 01:49:45 PST 2024
================
@@ -807,14 +807,11 @@ define <2 x i16> @test_select_cc_i16_i32(<2 x i16> %a, <2 x i16> %b,
define <2 x i16> @test_trunc_2xi32(<2 x i32> %a) #0 {
; COMMON-LABEL: test_trunc_2xi32(
; COMMON: {
-; COMMON-NEXT: .reg .b16 %rs<3>;
-; COMMON-NEXT: .reg .b32 %r<4>;
+; COMMON-NEXT: .reg .b32 %r<5>;
; COMMON-EMPTY:
; COMMON-NEXT: // %bb.0:
; COMMON-NEXT: ld.param.v2.u32 {%r1, %r2}, [test_trunc_2xi32_param_0];
-; COMMON-NEXT: cvt.u16.u32 %rs1, %r2;
-; COMMON-NEXT: cvt.u16.u32 %rs2, %r1;
-; COMMON-NEXT: mov.b32 %r3, {%rs2, %rs1};
+; COMMON-NEXT: prmt.b32 %r3, %r1, %r2, 0x5410U;
----------------
frasercrmck wrote:
On `ptxas v10.2.89` for SM70, no diff - it created a `PRMT` instruction either way.
On `ptxas v12.6.1`, for SM70,
``` diff
@@ -1,19 +1,11 @@
test_trunc_2xi32:
IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28]
@!PT SHFL.IDX PT, RZ, RZ, RZ, RZ
- LDC.U16 R0, c[0x0][0x160]
- MOV R3, c[0x0][0x164]
- LDC.U16 R5, c[0x0][0x164]
- PRMT R0, R0, 0x5410, R5
+ IMAD.MOV.U32 R3, RZ, RZ, c[0x0][0x164]
+ MOV R0, c[0x0][0x160]
+ PRMT R0, R0, 0x5410, R3
LDC RZ, c[0x0][R3+0x160]
EXIT
.L_x_0:
BRA `(.L_x_0)
- NOP
- NOP
- NOP
- NOP
- NOP
- NOP
- NOP
.L_x_1:
```
I'm not sure if the fewer NOPs are spurious or not.
On `ptxas v12.6.1`, for SM89,
``` diff
@@ -1,23 +1,15 @@
test_trunc_2xi32:
IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28]
- ULDC.U16 UR5, c[0x0][0x164]
- ULDC.U16 UR4, c[0x0][0x160]
- IMAD.U32 R3, RZ, RZ, UR5
- MOV R0, UR4
- ULDC UR4, c[0x0][0x164]
+ IMAD.MOV.U32 R3, RZ, RZ, c[0x0][0x164]
+ MOV R0, c[0x0][0x160]
PRMT R0, R0, 0x5410, R3
- LDC RZ, cx[UR4][0x160]
+ LDC RZ, c[0x0][R3+0x160]
EXIT
.L_x_0:
BRA `(.L_x_0)
NOP
NOP
NOP
- NOP
- NOP
- NOP
- NOP
- NOP
NOP
NOP
NOP
```
https://github.com/llvm/llvm-project/pull/116675
More information about the llvm-commits
mailing list