[llvm] [NVPTX] Optimize v2x16 BUILD_VECTORs to PRMT (PR #116675)

Tue Nov 19 01:49:46 PST 2024

================
@@ -1148,8 +1147,7 @@ define <2 x bfloat> @fma_bf16x2_expanded_no_nans_multiple_uses_of_fma(<2 x bfloa
 ; CHECK-SM70-NEXT:    setp.nan.f32 %p8, %f18, %f18;
 ; CHECK-SM70-NEXT:    or.b32 %r58, %r54, 4194304;
 ; CHECK-SM70-NEXT:    selp.b32 %r59, %r58, %r57, %p8;
-; CHECK-SM70-NEXT:    { .reg .b16 tmp; mov.b32 {tmp, %rs23}, %r59; }
-; CHECK-SM70-NEXT:    mov.b32 %r60, {%rs23, %rs20};
----------------
frasercrmck wrote:

I don't think so because it's still used for `(trunc (srl s, 16))` or `(extractelt $vec, 1)`. Perhaps if we generally matched both of those to PRMTs we could remove the code, but I suspect we'll always need the option to fall back to these patterns.

https://github.com/llvm/llvm-project/pull/116675