[PATCH] D51925: [AMDGPU] Fix issue for zext of f16 to i32

Mon Sep 24 07:38:07 PDT 2018

dstuttard added a comment.

In https://reviews.llvm.org/D51925#1233093, @arsenm wrote:

> In https://reviews.llvm.org/D51925#1230429, @dstuttard wrote:
>
> > Looking again at the code - you're correct that it attempts to only do this transformation if the high bits are zero.
> >  However, the code that checks this has the following telling comment:
> >
> >   // (i32 zext (i16 (bitcast f16:$src))) -> fp16_zext $src
> >   // FIXME: It is not universally true that the high bits are zeroed on gfx9.
> >   if (Src.getOpcode() == ISD::BITCAST) {
> >     SDValue BCSrc = Src.getOperand(0);
> >     if (BCSrc.getValueType() == MVT::f16 &&
> >         fp16SrcZerosHighBits(BCSrc.getOpcode()))
> >       return DCI.DAG.getNode(AMDGPUISD::FP16_ZEXT, SDLoc(N), VT, BCSrc);
> >   }
> >   
> >   
> >
> > In this particular case the BCSrc operation was an fptrunc which passes the fp16SrcZerosHighBits test - but that eventually ends up as v_mad_mixlo_f16 which doesn't ensure that the high bits are zero.
> >
> > Any suggestions on how to proceed? I agree that it seems a shame to have to insert the extra AND operation blindly.
>
>
> I guess you could check the subtarget in fp16SrcZerosHighBits. However that's pretty risky since it's depending on things we can't guarantee. Something could transform any other instruction into something else that won't preserve this. Overall I'm very unhappy this hardware change happened and it's a lot of work to handle all of this properly. I think what we really need is to drop this combine/node, and a separate machine instruction for every operation that preserves the high bits (with a tied source operand) vs. zeros them, and then have a machine pass that tries to clean up the extra ands while dropping this combine. We'll have to do extra work because we will have missed out on combines that this was enabling.

OK - given that something like that is a larger change, how about we commit this (with an appropriate comment) for now and work on something better in the long term?

Repository:
  rL LLVM

https://reviews.llvm.org/D51925