[PATCH] [X86] Skip concat_vectors when lowering vector broadcast

Wed Dec 11 13:01:27 PST 2013

Hi Robert, 

Thanks for working on this.  I am okay with your patch but I prefer that we canonicalize the code below:

%2 = v4f32 BUILD_VECTOR %1, %1, %1, %1
%3 = v8f32 concat_vectors %2, undef
%4 = v8f32 vector_shuffle %3, undef, <0,0,0,0,0,0,0,0>

into either a single BUILD_VECTOR or a broadcast vector_shuffle.  This should happen as a DAGCombine optimization any time before operation legalization.  

Thanks,
Nadav

On Dec 11, 2013, at 9:54 AM, Robert Lougher <rob.lougher at gmail.com> wrote:

> Hi all,
> 
> With AVX, the following two functions can be optimized using a
> vbroadcast instruction loading from memory (either a single or
> double):
> 
> _m256 loadSplat4x( const float *p ) {
>  __m128 r = _mm_load1_ps( p );
>  return (m256) __builtin_shufflevector( r, r, 0, 0, 0, 0, 0, 0, 0, 0 );
> }
> 
> m256d loadSplat8x( const double *p ) {
>  __m128d r = _mm_load1_pd( p );
>  return (_m256d) __builtin_shufflevector( r, r, 0, 0, 0, 0 );
> }
> 
> Current output (without AVX2):
> 
> loadSplat4x:
>  vmovss (%rdi), %xmm0
>  vpshufd $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0]
>  vinsertf128 $1, %xmm0, %ymm0, %ymm0
>  ret
> 
> loadSplat8x:
>  vmovsd (%rdi), %xmm0
>  vpermilpd $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0]
>  vinsertf128 $1, %xmm0, %ymm0, %ymm0
>  ret
> 
> Optimized output:
> 
> loadSplat4x:
>  vbroadcastss (%rdi), %ymm0
>  ret
> 
> loadSplat8x:
>  vbroadcastsd (%rdi), %ymm0
>  ret
> 
> In investigating this, I discovered that the x86 backend already tries
> to use a vbroadcast instruction for a splat (LowerVectorBroadcast in
> X86ISelLowering.cpp).  However, it fails to optimize the above cases
> because we end up with a concat_vectors in the selection DAG.
> 
> loadSplat4x generates the following IR (simplified):
> 
> define <8 x float> @loadSplat4x(float* %p) {
>  %1 = load float* %p
>  %2 = insertelement <4 x float> undef, float %1, i32 0
>  %3 = shufflevector <4 x float> %2, <4 x float> undef, <8 x i32>
> zeroinitializer
>  ret <8 x float> %3
> }
> 
> This generates the following selection DAG:
> 
> %1 = f32 load %p
> %2 = v4f32 BUILD_VECTOR %1, %1, %1, %1
> %3 = v8f32 concat_vectors %2, undef
> %4 = v8f32 vector_shuffle %3, undef, <0,0,0,0,0,0,0,0>
> ...
> 
> LowerVectorBroadcast() is called on the vector_shuffle. For the
> vbroadcast from memory pattern it expects to find a shuffle of either
> a scalar_to_vector or a BUILD_VECTOR but it finds a concat_vectors
> instead.  However, as we're splatting, both the concat_vectors and the
> BUILD_VECTOR can be replaced by a splat of the single loaded f32
> value.
> 
> The attached patch fixes this by skipping the concat_vectors during
> pattern recognition.  In this case, once the concat_vectors is
> skipped, we get a BUILD_VECTOR, and the pattern matches.
> 
> The alternative is to try and combine the BUILD_VECTOR/concat_vectors
> into a larger BUILD_VECTOR at an earlier stage (e.g.
> DAGCombiner::visitCONCAT_VECTORS).  However, unless we only handle the
> specific example above, the general case will be much more complex.
> The advantage of simply skipping the concat_vectors in the broadcast
> is that we don't need to examine the operands at all (it's just 2
> lines of code).  We will also handle scalar_to_vector/concat_vectors
> (if that can ever happen), plus the AVX2 fallback to a register to
> register vbroadcast will be simplified as we do not need to do an
> Extract128BitVector.
> 
> Having said that, I'm very new to LLVM development.  At the moment,
> simple fixes are preferred to big risky changes but I understand that
> generic solutions are more useful in the long run than small specific
> fixes.  Opinions and advice welcome!
> 
> I do not have commit access so if you think the patch is acceptable
> please commit it for me.
> 
> Thanks,
> Rob.
> 
> --
> Robert Lougher
> SN Systems - Sony Computer Entertainment Group
> <patch.diff>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits