[PATCH] D135428: [NVPTX] Support neg{.ftz} for f16 and f16x2
Jakub Chlanda via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 11 00:13:31 PDT 2022
jchlanda added a comment.
In D135428#3843292 <https://reviews.llvm.org/D135428#3843292>, @tra wrote:
> Just curious -- what prompts this change?
>
> Does it buy us anything performance-wise? AFAICT llvm may be generating better code for gpus w/o fp16 support -- it does xor on 32-bit value w/o splitting it into 16-bit halfs. https://godbolt.org/z/Wjx7ceT75
> Or is it needed to flush fp16 denormals consistently?
In all honesty I don't know what the motivation for this was, it came to my attention as a DPC++ bug (https://github.com/intel/llvm/issues/6958). I do think that your point about flushing behavior is important and should be preserved.
FWIW, using `neg`directly does not require a bitcast from `Float16x2Regs` to `Int32Regs` or `Float16Regs` to `Int16Regs`, as seen in the `xor` case.
// .globl test_neg_f16
.visible .func (.param .b32 func_retval0) test_neg_f16(
.param .b32 test_neg_f16_param_0
)
{
.reg .b16 %h<3>;
ld.param.b16 %h1, [test_neg_f16_param_0];
neg.f16 %h2, %h1;
st.param.b16 [func_retval0+0], %h2;
ret;
}
// .globl test_neg_f16x2
.visible .func (.param .align 4 .b8 func_retval0[4]) test_neg_f16x2(
.param .align 4 .b8 test_neg_f16x2_param_0[4]
)
{
.reg .b32 %hh<3>;
ld.param.b32 %hh1, [test_neg_f16x2_param_0];
neg.f16x2 %hh2, %hh1;
st.param.b32 [func_retval0+0], %hh2;
ret;
}
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135428/new/
https://reviews.llvm.org/D135428
More information about the llvm-commits
mailing list