[llvm] [NVPTX] Optimize v2x16 BUILD_VECTORs to PRMT (PR #116675)

Tue Nov 19 14:40:05 PST 2024

================
@@ -1148,8 +1147,7 @@ define <2 x bfloat> @fma_bf16x2_expanded_no_nans_multiple_uses_of_fma(<2 x bfloa
 ; CHECK-SM70-NEXT:    setp.nan.f32 %p8, %f18, %f18;
 ; CHECK-SM70-NEXT:    or.b32 %r58, %r54, 4194304;
 ; CHECK-SM70-NEXT:    selp.b32 %r59, %r58, %r57, %p8;
-; CHECK-SM70-NEXT:    { .reg .b16 tmp; mov.b32 {tmp, %rs23}, %r59; }
-; CHECK-SM70-NEXT:    mov.b32 %r60, {%rs23, %rs20};
----------------
Artem-B wrote:

> `(trunc (srl s, 16))` or `(extractelt $vec, 1)`

Those should be partially converted to `prmt`, too. The part that moves bits in multiples of 8 to the LSB of i32 maps to permute, and `trunc` would just be a regular truncating move. Does not have to be done in this patch, but if the change is trivial, it may fit here, too. Up to you.

https://github.com/llvm/llvm-project/pull/116675