[llvm] [NVPTX] Optimize v2x16 BUILD_VECTORs to PRMT (PR #116675)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 19 14:46:13 PST 2024
================
@@ -807,14 +807,11 @@ define <2 x i16> @test_select_cc_i16_i32(<2 x i16> %a, <2 x i16> %b,
define <2 x i16> @test_trunc_2xi32(<2 x i32> %a) #0 {
; COMMON-LABEL: test_trunc_2xi32(
; COMMON: {
-; COMMON-NEXT: .reg .b16 %rs<3>;
-; COMMON-NEXT: .reg .b32 %r<4>;
+; COMMON-NEXT: .reg .b32 %r<5>;
; COMMON-EMPTY:
; COMMON-NEXT: // %bb.0:
; COMMON-NEXT: ld.param.v2.u32 {%r1, %r2}, [test_trunc_2xi32_param_0];
-; COMMON-NEXT: cvt.u16.u32 %rs1, %r2;
-; COMMON-NEXT: cvt.u16.u32 %rs2, %r1;
-; COMMON-NEXT: mov.b32 %r3, {%rs2, %rs1};
+; COMMON-NEXT: prmt.b32 %r3, %r1, %r2, 0x5410U;
----------------
Artem-B wrote:
> I'm not sure if the fewer NOPs are spurious or not.
AFAICT, it's just padding to fill post-branch bubble and possibly align the `.L_x_1` address. NVIDIA does not disclose the microarchitecture details, so it's just a guess. In any case, those nops are not executed, so we do not care.
https://github.com/llvm/llvm-project/pull/116675
More information about the llvm-commits
mailing list