[llvm] [NVPTX] Optimize v2x16 BUILD_VECTORs to PRMT (PR #116675)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 19 14:40:05 PST 2024
================
@@ -1148,8 +1147,7 @@ define <2 x bfloat> @fma_bf16x2_expanded_no_nans_multiple_uses_of_fma(<2 x bfloa
; CHECK-SM70-NEXT: setp.nan.f32 %p8, %f18, %f18;
; CHECK-SM70-NEXT: or.b32 %r58, %r54, 4194304;
; CHECK-SM70-NEXT: selp.b32 %r59, %r58, %r57, %p8;
-; CHECK-SM70-NEXT: { .reg .b16 tmp; mov.b32 {tmp, %rs23}, %r59; }
-; CHECK-SM70-NEXT: mov.b32 %r60, {%rs23, %rs20};
----------------
Artem-B wrote:
> `(trunc (srl s, 16))` or `(extractelt $vec, 1)`
Those should be partially converted to `prmt`, too. The part that moves bits in multiples of 8 to the LSB of i32 maps to permute, and `trunc` would just be a regular truncating move. Does not have to be done in this patch, but if the change is trivial, it may fit here, too. Up to you.
https://github.com/llvm/llvm-project/pull/116675
More information about the llvm-commits
mailing list