[llvm] [NVPTX] Optimize v2x16 BUILD_VECTORs to PRMT (PR #116675)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 19 15:02:09 PST 2024
================
@@ -197,8 +196,7 @@ define <2 x bfloat> @test_faddx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-NEXT: setp.nan.f32 %p2, %f6, %f6;
; SM70-NEXT: or.b32 %r21, %r17, 4194304;
; SM70-NEXT: selp.b32 %r22, %r21, %r20, %p2;
-; SM70-NEXT: { .reg .b16 tmp; mov.b32 {tmp, %rs11}, %r22; }
-; SM70-NEXT: mov.b32 %r23, {%rs11, %rs7};
+; SM70-NEXT: prmt.b32 %r23, %r22, %r12, 0x7632U;
----------------
Artem-B wrote:
> @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ
This appears to be a fancy `NOP` as the `@!PT` predicate will always be false.
> LDC RZ, c[0x0][R7+0x160]
The `LDC RZ` instruction appears to be part of some sort of calling convention magic that's not directly related to the calculations we do in the function -- it loads something from the constants/parameters area, and, apparently ignores the result. We can ignore that.
The short version -- the code didn't change.
https://github.com/llvm/llvm-project/pull/116675
More information about the llvm-commits
mailing list