[PATCH] [X86] Skip concat_vectors when lowering vector broadcast

Tue Jan 7 11:52:25 PST 2014

ping.

On 13 December 2013 20:29, Robert Lougher <rob.lougher at gmail.com> wrote:
> Hi Nadav,
>
> On 11 December 2013 21:23, Nadav Rotem <nrotem at apple.com> wrote:
>>>
>>> Yes, that was my alternative (handle it in DAGCombiner::visitCONCAT_
>>> VECTORS).  My fear was that the general combine could get quite
>>> complex and as I'm new to LLVM I'd get stuck.  Would it be acceptable
>>> to add a simple combine that just handled the case above in the first
>>> instance?
>>
>>
>> Sure, lets start by handling the simple case first. :)
>
> So I had a go at implementing a combine - the simple case of
> concat_vectors(BUILD_VECTOR(X, undef)) canonicalized into a larger
> BUILD_VECTOR.
>
> This was looking good, and it fixed the initial vbroadcast issue
> without the need to change the x86 backend code.  However, one of the
> LLVM tests failed (CodeGen/X86/avx-shuffle.ll).  Analyzing this showed
> the swap8doubles test failed indirectly due to a change in the
> expected output.
>
> There's an x86 specific transform in PerformShuffleCombine256
> (X86ISelLowering.cpp) that recognizes a vector zero extend.  This
> involves a complex pattern of concat_vectors, BUILD_VECTOR and undef
> in operands 1 and 2 of a shuffle.  Of course, this is now
> canonicalized and the optimization no longer triggers.
>
> I had a go at "fixing" the optimization to handle the canonicalized
> pattern instead.  This is fairly trivial for the second operand to the
> shuffle (the vector built from zeroes and undef).  Instead of
> concat_vectors(BUILD_VECTOR(0,0,0,0), undef)), this is now
> BUILD_VECTOR(0,0,0,0,...).  This fixes the failure in the LLVM test.
>
> However, the problem is there is still a case which won't be
> recognized.  This is when V (referring to the diagram in the comments)
> is itself a BUILD_VECTOR.  In this case, the first operand to the
> shuffle will also have been canonicalized.  This is rather more
> difficult to handle as you'd need to recognize a BUILD_VECTOR in which
> the last half was undef.
>
> Before trying to handle this case I was going to come up with a
> testcase to trigger it (the hope being that with both operands to the
> shuffle canonicalized, the resultant code without triggering the
> optimization may be as good, so I don't need to handle it anyway).
>
> However, this testcase is proving tricky, so I thought I'd ask for
> advice before going any further.  Does the approach still sound
> sensible?  Is "fixing" PerformShuffleCombine256 the right way, or does
> it indicate that there may be other optimizations that will break, and
> ultimately that canonicalizing the concat_vectors/BUILD_VECTOR is a
> bad idea?
>
> The attached patch contains my modifications as described above...
>
> Thanks,
> Rob.