[llvm] [LLVM][NVPTX]Add BF16 vector instruction and fix lowering rules (PR #69415)

Han Shen via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 20 00:31:17 PDT 2023


================
@@ -1623,8 +1623,10 @@ SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
                                 SignMask);
 
   // If FABS is legal transform FCOPYSIGN(x, y) => sign(x) ? -FABS(x) : FABS(X)
+  // We don't do it in bf16 since the other path has less number of instructions
----------------
shenh10 wrote:

- Without FABS/FNEG-13 instructions.

```bash
.visible .func  (.param .align 4 .b8 func_retval0[4]) test_copysign(
        .param .align 4 .b8 test_copysign_param_0[4],
        .param .align 4 .b8 test_copysign_param_1[4]
)                                       // @test_copysign
{
        .reg .b16       %rs<17>;
        .reg .b32       %r<4>;

// %bb.0:
        ld.param.b32    %r1, [test_copysign_param_0];
        ld.param.b32    %r2, [test_copysign_param_1];
        mov.b32         {%rs1, %rs2}, %r2;
        and.b16         %rs4, %rs2, -32768;
        mov.b32         {%rs5, %rs6}, %r1;
        and.b16         %rs8, %rs6, 32767;
        or.b16          %rs9, %rs8, %rs4;
        and.b16         %rs12, %rs1, -32768;
        and.b16         %rs14, %rs5, 32767;
        or.b16          %rs15, %rs14, %rs12;
        mov.b32         %r3, {%rs15, %rs9};
        st.param.b32    [func_retval0+0], %r3;
        ret;
                                        // -- End function
}
```
- WithFABS/FNEG-19 instructions.

```bash
        // .globl       test_copysign           // -- Begin function test_copysign
.visible .func  (.param .align 4 .b8 func_retval0[4]) test_copysign(
        .param .align 4 .b8 test_copysign_param_0[4],
        .param .align 4 .b8 test_copysign_param_1[4]
)                                       // @test_copysign
{
        .reg .pred      %p<3>;
        .reg .b16       %rs<17>;
        .reg .b32       %r<4>;

// %bb.0:
        ld.param.b32    %r1, [test_copysign_param_1];
        ld.param.b32    %r2, [test_copysign_param_0];
        mov.b32         {%rs1, %rs2}, %r2;
        abs.bf16        %rs3, %rs2;
        neg.bf16        %rs4, %rs3;
        mov.b32         {%rs5, %rs6}, %r1;
        shr.u16         %rs8, %rs6, 15;
        and.b16         %rs9, %rs8, 1;
        setp.eq.b16     %p1, %rs9, 1;
        selp.b16        %rs10, %rs4, %rs3, %p1;
        abs.bf16        %rs11, %rs1;
        neg.bf16        %rs12, %rs11;
        shr.u16         %rs14, %rs5, 15;
        and.b16         %rs15, %rs14, 1;
        setp.eq.b16     %p2, %rs15, 1;
        selp.b16        %rs16, %rs12, %rs11, %p2;
        mov.b32         %r3, {%rs16, %rs10};
        st.param.b32    [func_retval0+0], %r3;
        ret;
                                        // -- End function
}

```

It seems that the latter is quite cumbersome and requires a lot of select instructions. The former only requires simple bit operations.

https://github.com/llvm/llvm-project/pull/69415


More information about the llvm-commits mailing list