[llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics
Ahmed Bougacha via llvm-dev
llvm-dev at lists.llvm.org
Tue Sep 8 16:58:34 PDT 2015
Hi Dan,
On Tue, Sep 8, 2015 at 4:48 PM, Dan Liew via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> I was looking at the x86 vector intrinsics for converting half
> precision floating point numbers and I'm a bit confused as to why
> certain types were chosen. I've gone ahead and used their current
> definition with success but I'd like to understand why the types used
> with these intrinsics are done this way.
>
> For reference see ``include/llvm/IR/IntrinsicsX86.td``. Here are the
> intrinsics of interest.
>
> ```
> let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
> def int_x86_vcvtph2ps_128 : GCCBuiltin<"__builtin_ia32_vcvtph2ps">,
> Intrinsic<[llvm_v4f32_ty], [llvm_v8i16_ty], [IntrNoMem]>;
> def int_x86_vcvtph2ps_256 : GCCBuiltin<"__builtin_ia32_vcvtph2ps256">,
> Intrinsic<[llvm_v8f32_ty], [llvm_v8i16_ty], [IntrNoMem]>;
> def int_x86_vcvtps2ph_128 : GCCBuiltin<"__builtin_ia32_vcvtps2ph">,
> Intrinsic<[llvm_v8i16_ty], [llvm_v4f32_ty, llvm_i32_ty],
> [IntrNoMem]>;
> def int_x86_vcvtps2ph_256 : GCCBuiltin<"__builtin_ia32_vcvtps2ph256">,
> Intrinsic<[llvm_v8i16_ty], [llvm_v8f32_ty, llvm_i32_ty],
> [IntrNoMem]>;
>
> ```
>
> Here's what seems weird to me:
>
> * For the 4 wide intrinsics (``_128`` suffix) some of the types are
> wider than they need to be. For example ``int_x86_vcvtph2ps_128``
> takes <8 x i16> as an argument but this intrinsic only uses the first
> four lanes so why is the argument type not <4 x i16>?
> ``int_x86_vcvtps2ph_128`` also has the same oddity but on its return
> type (returns <8 x i16> but only the first four are relevant).
One reason is that <4 x i16> is too small to be a legal SSE vector
type, so the IR intrinsics, much like the Intel C intrinsics and the
instructions, are defined in terms of the widened <8 x i16> (with
either __m128, or xmm registers).
> * The use of ``i16`` types also seems a little strange given that the
> more semantically correct ``f16`` type and vectorized forms (e.g.
> ``llvm_v4f16_ty``) are available. Sure I can use a bitcast with the
> intrinsics to get the type I want in the IR but why were ``i16`` was
> chosen over using ``f16``?
f16 wasn't, until recently, very well supported. It still has rough
edges on targets without native scalar register classes such as X86.
Instead, these targets use i16, and do the conversion with other
(native) FP types using the dedicated convert.to/from.fp16 intrinsics.
We match that here and use an i16 element type.
Someday, we'll get rid of these intrinsics and use half everywhere,
but we're not there yet!
HTH,
-Ahmed
> Any ideas?
>
> Thanks,
> Dan.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list