[PATCH] D86855: Convert __m64 intrinsics to unconditionally use SSE2 instead of MMX instructions.
Craig Topper via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Sun Aug 30 22:44:46 PDT 2020
craig.topper added inline comments.
================
Comment at: clang/lib/Headers/mmintrin.h:367
{
- return (__m64)__builtin_ia32_paddb((__v8qi)__m1, (__v8qi)__m2);
+ return (__m64)(((__v8qi)__m1) + ((__v8qi)__m2));
}
----------------
I think you we should use __v8qu to match what we do in emmintrin.h. We don't currently set nsw on signed vector arithmetic, but we should be careful in case that changes in the future.
================
Comment at: clang/lib/Headers/mmintrin.h:1097
{
- return __builtin_ia32_pand((__v1di)__m1, (__v1di)__m2);
+ return (__m64)(((__v1di)__m1) & ((__v1di)__m2));
}
----------------
I think we probably want to use a v2su or v2si here. Using v1di scalarizes and splits on 32-bit targets. On 64-bit targets it emits GPR code.
================
Comment at: clang/lib/Headers/mmintrin.h:1242
{
- return (__m64)__builtin_ia32_pcmpgtb((__v8qi)__m1, (__v8qi)__m2);
+ return (__m64)((__v8qi)__m1 > (__v8qi)__m2);
}
----------------
Need to use __v8qs here to force "signed char" elements. __v8qi uses "char" which has platform dependent signedness or can be changed with a command line.
================
Comment at: clang/lib/Headers/mmintrin.h:1264
{
- return (__m64)__builtin_ia32_pcmpgtw((__v4hi)__m1, (__v4hi)__m2);
+ return (__m64)((__v4hi)__m1 > (__v4hi)__m2);
}
----------------
Same here
================
Comment at: clang/lib/Headers/mmintrin.h:1394
{
- return _mm_set_pi32(__i, __i);
+ return __extension__ (__m64)(__v2si){__i, __i};
}
----------------
Is this needed?
================
Comment at: clang/lib/Headers/mmintrin.h:1452
{
- return _mm_set_pi32(__i1, __i0);
+ return __extension__ (__m64)(__v2si){__i1, __i0};
}
----------------
I don't think this change is needed. And I think the operands are in the wrong order.
================
Comment at: clang/lib/Headers/tmmintrin.h:17
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("ssse3"), __min_vector_width__(64)))
-#define __DEFAULT_FN_ATTRS_MMX __attribute__((__always_inline__, __nodebug__, __target__("mmx,ssse3"), __min_vector_width__(64)))
+#define __trunc64(x) (__m64)__builtin_shufflevector((__v2di)(x), __extension__ (__v2di){}, 0)
+#define __anyext128(x) (__m128i)__builtin_shufflevector((__v1di)(x), __extension__ (__v1di){}, 0, -1)
----------------
I'm worried that using v1di with the shuffles will lead to scalarization in the type legalizer. Should we use v2si instead?
================
Comment at: clang/lib/Headers/xmmintrin.h:2316
{
- return __builtin_ia32_pmovmskb((__v8qi)__a);
+ return __builtin_ia32_pmovmskb128((__v16qi)__anyext128(__a));
}
----------------
This doesn't guarantee zeroes in bits 15:8 does it?
================
Comment at: clang/lib/Headers/xmmintrin.h:2404
+ __m128i __n128 = __zext128(__n);
+ if (((__SIZE_TYPE__)__p & 0xfff) > 4096-15 && ((__SIZE_TYPE__)__p & 0xfff) < 4096-8) {
+ __p -= 8;
----------------
Does this work with large pages?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D86855/new/
https://reviews.llvm.org/D86855
More information about the cfe-commits
mailing list