[PATCH] D28455: [X86] Fix PR30926 - Add patterns for optimizing cvtsi2ss, cvtsi2sd, and cvtss2sd clang intrinsic sequences

Elad Cohen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Jan 8 22:44:18 PST 2017


eladcohen added a comment.

In https://reviews.llvm.org/D28455#639276, @RKSimon wrote:

> Doesn't (v)cvtsd2ss suffer from the same issue as well?


The Clang intrinsic for (v)cvtsd2ss is implemented using a builtin (_builtin_ia32_cvtsd2ss) and not lowered to generic IR so we don't see this happening. However looking at the semantics IIUC it seems that:

  static __inline__ __m128 __DEFAULT_FN_ATTRS
  _mm_cvtsd_ss(__m128 __a, __m128d __b) {
    return (__m128)__builtin_ia32_cvtsd2ss((__v4sf)__a, (__v2df)__b);
  }

could be also implemented with:

  static __inline__ __m128 __DEFAULT_FN_ATTRS
  _mm_cvtsd_ss(__m128 __a, __m128d __b)
  {
    __a[0] =__b[0];
   return a;
  }

Bottom line, I think you are right because the above code doesn't have to come from an intrinsic. I'll add the pattern (I should probably also open a bugzilla for lowering _mm_cvtsd_ss() to generic IR, Right?)
Thanks for the catch!


https://reviews.llvm.org/D28455





More information about the llvm-commits mailing list