[PATCH] D135428: [NVPTX] Support neg{.ftz} for f16 and f16x2

Tue Oct 11 00:13:31 PDT 2022

jchlanda added a comment.

In D135428#3843292 <https://reviews.llvm.org/D135428#3843292>, @tra wrote:

> Just curious -- what prompts this change?
>
> Does it buy us anything performance-wise? AFAICT llvm may be generating better code for gpus w/o fp16 support -- it does xor on 32-bit value w/o splitting it into 16-bit halfs. https://godbolt.org/z/Wjx7ceT75
> Or is it needed to flush fp16 denormals consistently?

In all honesty I don't know what the motivation for this was, it came to my attention as a DPC++ bug (https://github.com/intel/llvm/issues/6958). I do think that your point about flushing behavior is important and should be preserved.
FWIW, using `neg`directly does not require a bitcast from `Float16x2Regs` to `Int32Regs` or `Float16Regs` to `Int16Regs`, as seen in the `xor` case.

          // .globl       test_neg_f16
  .visible .func  (.param .b32 func_retval0) test_neg_f16(
          .param .b32 test_neg_f16_param_0
  )
  {
          .reg .b16       %h<3>;

          ld.param.b16    %h1, [test_neg_f16_param_0];
          neg.f16         %h2, %h1;
          st.param.b16    [func_retval0+0], %h2;
          ret;

  }
          // .globl       test_neg_f16x2
  .visible .func  (.param .align 4 .b8 func_retval0[4]) test_neg_f16x2(
          .param .align 4 .b8 test_neg_f16x2_param_0[4]
  )
  {
          .reg .b32       %hh<3>;

          ld.param.b32    %hh1, [test_neg_f16x2_param_0];
          neg.f16x2       %hh2, %hh1;
          st.param.b32    [func_retval0+0], %hh2;
          ret;

  }

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135428/new/

https://reviews.llvm.org/D135428