[PATCH] [X86] Replace avx2.pbroadcast intrinsics with native IR.

Mon Jun 22 15:36:40 PDT 2015

In http://reviews.llvm.org/D10555#191295, @silvas wrote:

> In http://reviews.llvm.org/D10555#191178, @spatel wrote:
>
> > In http://reviews.llvm.org/D10555#191124, @ab wrote:
> >
> > > To make sure I understand: this is only a problem because of DAGCombines running at -O0, right?  (and perhaps some of the lowering being too smart? though without combines I'd find that surprising)
> > >  And this in turn is only a problem because the C intrinsics (_mm_*) are always inlined, and thus can be combined, right?
> >
> >
> > I think the problem is independent of inlining and DAGCombines. As an example, consider this:
> >
> >   __m128 foo(__m256 a) {
> >     return _mm256_extractf128_ps(a, 0);
> >   }
>
>
> If _mm256_extractf128_ps is a proper function instead of a macro (using the enable_if trick if necessary), would Ahmed's suggestion work for keeping these debuggable?

I tried an experiment with:

  __m128i foo(__m128i x) {
    return _mm_add_epi32(x, _mm_set1_epi32(0));  // so easy to optimize, but...must...resist!
  }

...because that's defined as a proper function:

  static __inline__ __m128i DEFAULT_FN_ATTRS
  _mm_add_epi32(__m128i __a, __m128i __b)
  {
    return (__m128i)((__v4si)__a + (__v4si)__b);
  }

The add is present in the unoptimized IR, but it's gone in the asm. Removing the '__inline__' didn't appear to change anything in this example. 
Removing 'inline' could cause a different problem - vector coders really don't want those header files showing up in profiles or stepping in/out while debugging. IIRC, that happened for some reason with gcc about 10 years ago and had to be worked around.

http://reviews.llvm.org/D10555

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/