[PATCH] [X86] Updates to X86 backend for f16 promotion

Thu Apr 30 15:15:30 PDT 2015

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17367-17373
@@ -17356,2 +17366,9 @@
   }
+  case ISD::FP_EXTEND: {
+    // Right now, only MVT::v2f32 has OperationAction for FP_EXTEND.
+    // No other ValueType for FP_EXTEND should reach this point.
+    assert(N->getValueType(0) == MVT::v2f32 &&
+           "Do not know how to legalize this Node");
+    return;
+  }
   case ISD::INTRINSIC_W_CHAIN: {
----------------
pirama wrote:
> ab wrote:
> > Does v2f32 even hit this?  Without this patch it should trigger the default unreachable, why would that change?
> Around line 924 in this file:
> 
> ```
> setOperationAction(ISD::FP_EXTEND,          MVT::v2f32, Custom);
> ```
> is what necessitates this.  v2f32 will be a valuetype of an FP_EXTEND node only when the source type is smaller (i.e. v2f16).  I am not sure why the OperationAction is set as above, but it triggers the unreachable error for FP_EXTEND from v2f16.  The included test for vec4 extend fails because vec4 is split during legalization.  Now that I think about it, I should add a vec2 test so this test doesn't rot if vec4 is legal in the future.
How about removing that line, and removing the FP_EXTEND block as well?  Both seem pretty pointless IMO.

================
Comment at: test/CodeGen/X86/half.ll:103
@@ +102,3 @@
+; CHECK-LIBCALL: cvttss2si
+; CHECK-LIBCALL: cvttss2si
+; CHECK-FP16: vcvtph2ps
----------------
pirama wrote:
> The uitofp generates two cvttss2si instructions along with a bunch of other instructions.  I don't really understand what's going.  Here's the exact sequence for '-f16c':
> 
>         .cfi_def_cfa_offset 16
>         movzwl  (%rdi), %edi
>         callq   __gnu_h2f_ieee
>         movss   .LCPI9_0(%rip), %xmm1   # xmm1 = mem[0],zero,zero,zero
>         movaps  %xmm0, %xmm2
>         subss   %xmm1, %xmm2
>         cvttss2si       %xmm2, %rax
>         movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
>         xorq    %rax, %rcx
>         cvttss2si       %xmm0, %rax
>         ucomiss %xmm1, %xmm0
>         cmovaeq %rcx, %rax
>         popq    %rdx
>         retq
> 
> If you think this is optimal, I'll add CHECKs for the movabs, xor etc.
Looks like when we don't have a native FP_TO_UINT, we try to expand it using FP_TO_SINT.  I understand it as, the intuition being, when:
    i64 fptosi (f32 a)
fits in 63 bits, we can just return that.  If it doesn't, we can assume it has to be in [2^63; 2^64[ (otherwise the result is undefined), and we can first do:
    a -= (f32)(2 << 63)
equivalent in spirit to something like "(a & ~(2 << 63))", if a were an i64.
Then you can just to:
    (or (i64 fptosi (f32 a)), (2 << 63))

So, the code looks good.  The fact that we use cvttss2si can be confusing, so it's probably best to match more of that logic.

http://reviews.llvm.org/D9092

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/