[llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics

Thu Sep 10 12:13:05 PDT 2015

On Tue, Sep 8, 2015 at 5:23 PM, Dan Liew <dan at su-root.co.uk> wrote:
> Hi,
>
>>> Here's what seems weird to me:
>>>
>>> * For the 4 wide intrinsics (``_128`` suffix) some of the types are
>>> wider than they need to be. For example ``int_x86_vcvtph2ps_128``
>>> takes <8 x i16> as an argument but this intrinsic only uses the first
>>> four lanes so why is the argument type not <4 x i16>?
>>> ``int_x86_vcvtps2ph_128`` also has the same oddity but on its return
>>> type (returns <8 x i16> but only the first four are relevant).
>>
>> One reason is that <4 x i16> is too small to be a legal SSE vector
>> type, so the IR intrinsics, much like the Intel C intrinsics and the
>> instructions, are defined in terms of the widened <8 x i16> (with
>> either __m128, or xmm registers).
>
> Ah I see. Makes sense.
>
>>> * The use of ``i16`` types also seems a little strange given that the
>>> more semantically correct ``f16`` type and vectorized forms (e.g.
>>> ``llvm_v4f16_ty``) are available. Sure I can use a bitcast with the
>>> intrinsics to get the type I want in the IR but why were ``i16`` was
>>> chosen over using ``f16``?
>>
>> f16 wasn't, until recently, very well supported. It still has rough
>> edges on targets without native scalar register classes such as X86.
>
> What do you mean by "register classes"?

Here's an improvised vague definition: a register class is the set of
registers that can be used interchangeably in some specific context.

So, in this case, on X86 (see lib/Target/X86/X86RegisterInfo.td), we
have the 128-bit uses of xmm registers (part of the VR128 register
class), but also the scalar equivalents (in e.g. "addsd %xmm"):
FR32/FR64.

Since we have no way of copying/storing/etc.. the lowest 16-bits of an
xmm register, any f16 scalar will need to be legalized, usually to
f32.

Given that clang, for __fp16, only generates loads, stores, and
conversions (via libcalls), it's simpler and usually more efficient to
instead represent half values as i16, as i16 can be
loaded/stored/passed (since we do have i16 register classes
(%ax/%r9w/etc.. in GR16)).

Now, one can argue that using i16 instructions for f16 is different
from representing __fp16 as i16 in IR.  That's legitimate, but, until
recently, we haven't needed __fp16 for anything other than the above,
so why bother with the additional complexity.  Again, this will
change, hopefully soon!

> Sorry if this is a dumb question.

There's no such thing ;)

-Ahmed

>> Instead, these targets use i16, and do the conversion with other
>> (native) FP types using the dedicated convert.to/from.fp16 intrinsics.
>> We match that here and use an i16 element type.
>
> I remember seeing that intrinsic in the language reference but
> unfortunately ``convert.to.fp16`` [1] isn't useful
> for what I'm working on because it doesn't specify a rounding mode.
> fp16 has so little precision that the rounding mode
> **really matters**.
>
>> Someday, we'll get rid of these intrinsics and use half everywhere,
>> but we're not there yet!
>
> Okay.
>
>> HTH,
>
> Very helpful, thanks.
>
>
> [1] http://llvm.org/docs/LangRef.html#llvm-convert-to-fp16-intrinsic
>
> Thanks,
> Dan.