[llvm-dev] Question about llvm vectors

Wed Aug 19 07:11:30 PDT 2020

Hi,

I love llvm vectors, yet I wonder why some advanced vector operations are
specific to some CPU targets?

Let me take an example:

/// Horizontally adds the adjacent pairs of values contained in two
///    128-bit vectors of [4 x float].
///
/// \headerfile <x86intrin.h>
///
/// This intrinsic corresponds to the <c> VHADDPS </c> instruction.
///
/// \param __a
///    A 128-bit vector of [4 x float] containing one of the source
operands.
///    The horizontal sums of the values are stored in the lower bits of the
///    destination.
/// \param __b
///    A 128-bit vector of [4 x float] containing one of the source
operands.
///    The horizontal sums of the values are stored in the upper bits of the
///    destination.
/// \returns A 128-bit vector of [4 x float] containing the horizontal sums
of
///    both operands.
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_hadd_ps(__m128 __a, __m128 __b)
{
  return __builtin_ia32_haddps((__v4sf)__a, (__v4sf)__b);
}

Here clang will translate _mm_hadd_ps to a CPU specific feature.
Why not create __builtin_vector_hadd(a, b) which would select the CPU
specific instruction or a fallback generic implementation?

Many thanks,
Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200819/c4cb85dd/attachment.html>