[PATCH] [X86] Replace avx2.pbroadcast intrinsics with native IR.
Sanjay Patel
spatel at rotateright.com
Mon Jun 22 15:36:40 PDT 2015
In http://reviews.llvm.org/D10555#191295, @silvas wrote:
> In http://reviews.llvm.org/D10555#191178, @spatel wrote:
>
> > In http://reviews.llvm.org/D10555#191124, @ab wrote:
> >
> > > To make sure I understand: this is only a problem because of DAGCombines running at -O0, right? (and perhaps some of the lowering being too smart? though without combines I'd find that surprising)
> > > And this in turn is only a problem because the C intrinsics (_mm_*) are always inlined, and thus can be combined, right?
> >
> >
> > I think the problem is independent of inlining and DAGCombines. As an example, consider this:
> >
> > __m128 foo(__m256 a) {
> > return _mm256_extractf128_ps(a, 0);
> > }
>
>
> If _mm256_extractf128_ps is a proper function instead of a macro (using the enable_if trick if necessary), would Ahmed's suggestion work for keeping these debuggable?
I tried an experiment with:
__m128i foo(__m128i x) {
return _mm_add_epi32(x, _mm_set1_epi32(0)); // so easy to optimize, but...must...resist!
}
...because that's defined as a proper function:
static __inline__ __m128i DEFAULT_FN_ATTRS
_mm_add_epi32(__m128i __a, __m128i __b)
{
return (__m128i)((__v4si)__a + (__v4si)__b);
}
The add is present in the unoptimized IR, but it's gone in the asm. Removing the '__inline__' didn't appear to change anything in this example.
Removing 'inline' could cause a different problem - vector coders really don't want those header files showing up in profiles or stepping in/out while debugging. IIRC, that happened for some reason with gcc about 10 years ago and had to be worked around.
http://reviews.llvm.org/D10555
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list