[llvm] [LLVM][NVPTX]Add BF16 vector instruction and fix lowering rules (PR #69415)
Han Shen via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 20 00:31:17 PDT 2023
================
@@ -1623,8 +1623,10 @@ SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
SignMask);
// If FABS is legal transform FCOPYSIGN(x, y) => sign(x) ? -FABS(x) : FABS(X)
+ // We don't do it in bf16 since the other path has less number of instructions
----------------
shenh10 wrote:
- Without FABS/FNEG-13 instructions.
```bash
.visible .func (.param .align 4 .b8 func_retval0[4]) test_copysign(
.param .align 4 .b8 test_copysign_param_0[4],
.param .align 4 .b8 test_copysign_param_1[4]
) // @test_copysign
{
.reg .b16 %rs<17>;
.reg .b32 %r<4>;
// %bb.0:
ld.param.b32 %r1, [test_copysign_param_0];
ld.param.b32 %r2, [test_copysign_param_1];
mov.b32 {%rs1, %rs2}, %r2;
and.b16 %rs4, %rs2, -32768;
mov.b32 {%rs5, %rs6}, %r1;
and.b16 %rs8, %rs6, 32767;
or.b16 %rs9, %rs8, %rs4;
and.b16 %rs12, %rs1, -32768;
and.b16 %rs14, %rs5, 32767;
or.b16 %rs15, %rs14, %rs12;
mov.b32 %r3, {%rs15, %rs9};
st.param.b32 [func_retval0+0], %r3;
ret;
// -- End function
}
```
- WithFABS/FNEG-19 instructions.
```bash
// .globl test_copysign // -- Begin function test_copysign
.visible .func (.param .align 4 .b8 func_retval0[4]) test_copysign(
.param .align 4 .b8 test_copysign_param_0[4],
.param .align 4 .b8 test_copysign_param_1[4]
) // @test_copysign
{
.reg .pred %p<3>;
.reg .b16 %rs<17>;
.reg .b32 %r<4>;
// %bb.0:
ld.param.b32 %r1, [test_copysign_param_1];
ld.param.b32 %r2, [test_copysign_param_0];
mov.b32 {%rs1, %rs2}, %r2;
abs.bf16 %rs3, %rs2;
neg.bf16 %rs4, %rs3;
mov.b32 {%rs5, %rs6}, %r1;
shr.u16 %rs8, %rs6, 15;
and.b16 %rs9, %rs8, 1;
setp.eq.b16 %p1, %rs9, 1;
selp.b16 %rs10, %rs4, %rs3, %p1;
abs.bf16 %rs11, %rs1;
neg.bf16 %rs12, %rs11;
shr.u16 %rs14, %rs5, 15;
and.b16 %rs15, %rs14, 1;
setp.eq.b16 %p2, %rs15, 1;
selp.b16 %rs16, %rs12, %rs11, %p2;
mov.b32 %r3, {%rs16, %rs10};
st.param.b32 [func_retval0+0], %r3;
ret;
// -- End function
}
```
It seems that the latter is quite cumbersome and requires a lot of select instructions. The former only requires simple bit operations.
https://github.com/llvm/llvm-project/pull/69415
More information about the llvm-commits
mailing list