[PATCH] [X86] Updates to X86 backend for f16 promotion
Ahmed Bougacha
ahmed.bougacha at gmail.com
Thu Apr 30 15:15:30 PDT 2015
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17367-17373
@@ -17356,2 +17366,9 @@
}
+ case ISD::FP_EXTEND: {
+ // Right now, only MVT::v2f32 has OperationAction for FP_EXTEND.
+ // No other ValueType for FP_EXTEND should reach this point.
+ assert(N->getValueType(0) == MVT::v2f32 &&
+ "Do not know how to legalize this Node");
+ return;
+ }
case ISD::INTRINSIC_W_CHAIN: {
----------------
pirama wrote:
> ab wrote:
> > Does v2f32 even hit this? Without this patch it should trigger the default unreachable, why would that change?
> Around line 924 in this file:
>
> ```
> setOperationAction(ISD::FP_EXTEND, MVT::v2f32, Custom);
> ```
> is what necessitates this. v2f32 will be a valuetype of an FP_EXTEND node only when the source type is smaller (i.e. v2f16). I am not sure why the OperationAction is set as above, but it triggers the unreachable error for FP_EXTEND from v2f16. The included test for vec4 extend fails because vec4 is split during legalization. Now that I think about it, I should add a vec2 test so this test doesn't rot if vec4 is legal in the future.
How about removing that line, and removing the FP_EXTEND block as well? Both seem pretty pointless IMO.
================
Comment at: test/CodeGen/X86/half.ll:103
@@ +102,3 @@
+; CHECK-LIBCALL: cvttss2si
+; CHECK-LIBCALL: cvttss2si
+; CHECK-FP16: vcvtph2ps
----------------
pirama wrote:
> The uitofp generates two cvttss2si instructions along with a bunch of other instructions. I don't really understand what's going. Here's the exact sequence for '-f16c':
>
> .cfi_def_cfa_offset 16
> movzwl (%rdi), %edi
> callq __gnu_h2f_ieee
> movss .LCPI9_0(%rip), %xmm1 # xmm1 = mem[0],zero,zero,zero
> movaps %xmm0, %xmm2
> subss %xmm1, %xmm2
> cvttss2si %xmm2, %rax
> movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
> xorq %rax, %rcx
> cvttss2si %xmm0, %rax
> ucomiss %xmm1, %xmm0
> cmovaeq %rcx, %rax
> popq %rdx
> retq
>
> If you think this is optimal, I'll add CHECKs for the movabs, xor etc.
Looks like when we don't have a native FP_TO_UINT, we try to expand it using FP_TO_SINT. I understand it as, the intuition being, when:
i64 fptosi (f32 a)
fits in 63 bits, we can just return that. If it doesn't, we can assume it has to be in [2^63; 2^64[ (otherwise the result is undefined), and we can first do:
a -= (f32)(2 << 63)
equivalent in spirit to something like "(a & ~(2 << 63))", if a were an i64.
Then you can just to:
(or (i64 fptosi (f32 a)), (2 << 63))
So, the code looks good. The fact that we use cvttss2si can be confusing, so it's probably best to match more of that logic.
http://reviews.llvm.org/D9092
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list