[clang] [llvm] Clang: convert `__m64` intrinsics to unconditionally use SSE2 instead of MMX. (PR #96540)

Mon Jun 24 23:28:56 PDT 2024

================
@@ -242,10 +243,11 @@ _mm_hadd_epi32(__m128i __a, __m128i __b)
 ///    destination.
 /// \returns A 64-bit vector of [4 x i16] containing the horizontal sums of both
 ///    operands.
-static __inline__ __m64 __DEFAULT_FN_ATTRS_MMX
+static __inline__ __m64 __DEFAULT_FN_ATTRS
 _mm_hadd_pi16(__m64 __a, __m64 __b)
 {
-    return (__m64)__builtin_ia32_phaddw((__v4hi)__a, (__v4hi)__b);
+    return __extract2_32(__builtin_ia32_phaddw128((__v8hi)__anyext128(__a),
----------------
phoebewang wrote:

Shuffle first and then __trunc64? The same below.

https://github.com/llvm/llvm-project/pull/96540