[PATCH] [X86] Updates to X86 backend for f16 promotion

Thu Apr 30 17:42:20 PDT 2015

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17367-17373
@@ -17356,2 +17366,9 @@
   }
+  case ISD::FP_EXTEND: {
+    // Right now, only MVT::v2f32 has OperationAction for FP_EXTEND.
+    // No other ValueType for FP_EXTEND should reach this point.
+    assert(N->getValueType(0) == MVT::v2f32 &&
+           "Do not know how to legalize this Node");
+    return;
+  }
   case ISD::INTRINSIC_W_CHAIN: {
----------------
ab wrote:
> pirama wrote:
> > ab wrote:
> > > Does v2f32 even hit this?  Without this patch it should trigger the default unreachable, why would that change?
> > Around line 924 in this file:
> > 
> > ```
> > setOperationAction(ISD::FP_EXTEND,          MVT::v2f32, Custom);
> > ```
> > is what necessitates this.  v2f32 will be a valuetype of an FP_EXTEND node only when the source type is smaller (i.e. v2f16).  I am not sure why the OperationAction is set as above, but it triggers the unreachable error for FP_EXTEND from v2f16.  The included test for vec4 extend fails because vec4 is split during legalization.  Now that I think about it, I should add a vec2 test so this test doesn't rot if vec4 is legal in the future.
> How about removing that line, and removing the FP_EXTEND block as well?  Both seem pretty pointless IMO.
Removing the operation action causes test/CodeGen/X86/pr11334.ll to fail.  For this test, DAGTypeLegalizer::WidenVectorOperand calls CustomLowerNode based on the operand's value type which eventually causes the v2f32 to be expanded to a v4f32 and use a VFPEXT (in LowerFP_EXTEND).

I understood that operation action is always defined for the value type of a node and not on its operand's types.  But, looks like that's not the case.

================
Comment at: test/CodeGen/X86/half.ll:103
@@ +102,3 @@
+; CHECK-LIBCALL: cvttss2si
+; CHECK-LIBCALL: cvttss2si
+; CHECK-FP16: vcvtph2ps
----------------
ab wrote:
> pirama wrote:
> > The uitofp generates two cvttss2si instructions along with a bunch of other instructions.  I don't really understand what's going.  Here's the exact sequence for '-f16c':
> > 
> >         .cfi_def_cfa_offset 16
> >         movzwl  (%rdi), %edi
> >         callq   __gnu_h2f_ieee
> >         movss   .LCPI9_0(%rip), %xmm1   # xmm1 = mem[0],zero,zero,zero
> >         movaps  %xmm0, %xmm2
> >         subss   %xmm1, %xmm2
> >         cvttss2si       %xmm2, %rax
> >         movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
> >         xorq    %rax, %rcx
> >         cvttss2si       %xmm0, %rax
> >         ucomiss %xmm1, %xmm0
> >         cmovaeq %rcx, %rax
> >         popq    %rdx
> >         retq
> > 
> > If you think this is optimal, I'll add CHECKs for the movabs, xor etc.
> Looks like when we don't have a native FP_TO_UINT, we try to expand it using FP_TO_SINT.  I understand it as, the intuition being, when:
>     i64 fptosi (f32 a)
> fits in 63 bits, we can just return that.  If it doesn't, we can assume it has to be in [2^63; 2^64[ (otherwise the result is undefined), and we can first do:
>     a -= (f32)(2 << 63)
> equivalent in spirit to something like "(a & ~(2 << 63))", if a were an i64.
> Then you can just to:
>     (or (i64 fptosi (f32 a)), (2 << 63))
> 
> So, the code looks good.  The fact that we use cvttss2si can be confusing, so it's probably best to match more of that logic.
Aah, that makes sense.  Thanks for the explanation!

http://reviews.llvm.org/D9092

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/