[PATCH] D28455: [X86] Fix PR30926 - Add patterns for optimizing cvtsi2ss, cvtsi2sd, and cvtss2sd clang intrinsic sequences
Elad Cohen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jan 8 22:44:18 PST 2017
eladcohen added a comment.
In https://reviews.llvm.org/D28455#639276, @RKSimon wrote:
> Doesn't (v)cvtsd2ss suffer from the same issue as well?
The Clang intrinsic for (v)cvtsd2ss is implemented using a builtin (_builtin_ia32_cvtsd2ss) and not lowered to generic IR so we don't see this happening. However looking at the semantics IIUC it seems that:
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_cvtsd_ss(__m128 __a, __m128d __b) {
return (__m128)__builtin_ia32_cvtsd2ss((__v4sf)__a, (__v2df)__b);
}
could be also implemented with:
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_cvtsd_ss(__m128 __a, __m128d __b)
{
__a[0] =__b[0];
return a;
}
Bottom line, I think you are right because the above code doesn't have to come from an intrinsic. I'll add the pattern (I should probably also open a bugzilla for lowering _mm_cvtsd_ss() to generic IR, Right?)
Thanks for the catch!
https://reviews.llvm.org/D28455
More information about the llvm-commits
mailing list