[clang] [llvm] Clang: convert `__m64` intrinsics to unconditionally use SSE2 instead of MMX. (PR #96540)
Simon Pilgrim via cfe-commits
cfe-commits at lists.llvm.org
Tue Jun 25 09:45:56 PDT 2024
================
@@ -614,12 +623,15 @@ _mm_shuffle_epi8(__m128i __a, __m128i __b)
/// 1: Clear the corresponding byte in the destination. \n
/// 0: Copy the selected source byte to the corresponding byte in the
/// destination. \n
-/// Bits [3:0] select the source byte to be copied.
+/// Bits [2:0] select the source byte to be copied.
/// \returns A 64-bit integer vector containing the copied or cleared values.
-static __inline__ __m64 __DEFAULT_FN_ATTRS_MMX
+static __inline__ __m64 __DEFAULT_FN_ATTRS
_mm_shuffle_pi8(__m64 __a, __m64 __b)
{
- return (__m64)__builtin_ia32_pshufb((__v8qi)__a, (__v8qi)__b);
+ return __trunc64(__builtin_ia32_pshufb128(
+ (__v16qi)__builtin_shufflevector(
+ (__v2si)(__a), __extension__ (__v2si){}, 0, 1, 0, 1),
----------------
RKSimon wrote:
MMX pshufb only uses the first 3-bits for index while SSE uses 4-bits - by splatting the subvector we avoid having to AND the mask input as it can reference either lo/hi subvector.
https://github.com/llvm/llvm-project/pull/96540
More information about the cfe-commits
mailing list