[llvm] [NVPTX] Optimize v2x16 BUILD_VECTORs to PRMT (PR #116675)

Fraser Cormack via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 19 01:49:46 PST 2024


================
@@ -197,8 +196,7 @@ define <2 x bfloat> @test_faddx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-NEXT:    setp.nan.f32 %p2, %f6, %f6;
 ; SM70-NEXT:    or.b32 %r21, %r17, 4194304;
 ; SM70-NEXT:    selp.b32 %r22, %r21, %r20, %p2;
-; SM70-NEXT:    { .reg .b16 tmp; mov.b32 {tmp, %rs11}, %r22; }
-; SM70-NEXT:    mov.b32 %r23, {%rs11, %rs7};
+; SM70-NEXT:    prmt.b32 %r23, %r22, %r12, 0x7632U;
----------------
frasercrmck wrote:

On `ptxas v12.6.1` for SM89 and SM70, no diff.
On `ptxas v10.2.89` for SM70,

``` diff
@@ -1,6 +1,6 @@
 test_faddx2:
  IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28]
- NOP
+ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ
  LDC.U16 R0, c[0x0][0x166]
  IMAD.MOV.U32 R4, RZ, RZ, c[0x0][0x164]
  IMAD.MOV.U32 R7, RZ, RZ, c[0x0][0x160]
@@ -22,8 +22,14 @@
  IADD3 R3, R3, 0x7fff, R0
  @P0 LOP3.LUT R3, R0, 0x400000, RZ, 0xfc, !PT
  PRMT R2, R2, 0x7632, R3
- LDC RZ, c[0x0][R7]
+ LDC RZ, c[0x0][R7+0x160]
  EXIT
-.L_1:
- BRA `(.L_1)
-.L_20:
+.L_x_0:
+ BRA `(.L_x_0)
+ NOP
+ NOP
+ NOP
+ NOP
+ NOP
+ NOP
+.L_x_1:
```

I can't claim to know what's going on here and why `prmt` is causing it.

https://github.com/llvm/llvm-project/pull/116675


More information about the llvm-commits mailing list