[PATCH] [X86] Skip concat_vectors when lowering vector broadcast

Fri Jan 10 13:12:51 PST 2014

Hi Robert, 

Sorry for the delay in review. 

> 
> So I had a go at implementing a combine - the simple case of
> concat_vectors(BUILD_VECTOR(X, undef)) canonicalized into a larger
> BUILD_VECTOR.
> 
> This was looking good, and it fixed the initial vbroadcast issue
> without the need to change the x86 backend code.

Excellent!

>  However, one of the
> LLVM tests failed (CodeGen/X86/avx-shuffle.ll).  Analyzing this showed
> the swap8doubles test failed indirectly due to a change in the
> expected output.

Is the assembly output better now ? 

> 
> There's an x86 specific transform in PerformShuffleCombine256
> (X86ISelLowering.cpp) that recognizes a vector zero extend.  This
> involves a complex pattern of concat_vectors, BUILD_VECTOR and undef
> in operands 1 and 2 of a shuffle.  Of course, this is now
> canonicalized and the optimization no longer triggers.

We need to make sure that the lowering code can handle the new canonicalization. 

> I had a go at "fixing" the optimization to handle the canonicalized
> pattern instead.  This is fairly trivial for the second operand to the
> shuffle (the vector built from zeroes and undef).  Instead of
> concat_vectors(BUILD_VECTOR(0,0,0,0), undef)), this is now
> BUILD_VECTOR(0,0,0,0,...).  This fixes the failure in the LLVM test.
> 

> However, the problem is there is still a case which won't be
> recognized.

>  This is when V (referring to the diagram in the comments)
> is itself a BUILD_VECTOR.  In this case, the first operand to the
> shuffle will also have been canonicalized.  This is rather more
> difficult to handle as you'd need to recognize a BUILD_VECTOR in which
> the last half was undef.

Why is this difficult to handle ? We just need to scan the last elements of the build_vector. 

> 
> Before trying to handle this case I was going to come up with a
> testcase to trigger it (the hope being that with both operands to the
> shuffle canonicalized, the resultant code without triggering the
> optimization may be as good, so I don't need to handle it anyway).
> 

One way to produce test cases is to implement a verifier that will crash on our interesting case with an assert. Next, you can use llvm-stress to fuzz it until you find the random pattern that makes it crash. 

> However, this testcase is proving tricky, so I thought I'd ask for
> advice before going any further.  Does the approach still sound
> sensible?  Is "fixing" PerformShuffleCombine256 the right way, or does
> it indicate that there may be other optimizations that will break, and
> ultimately that canonicalizing the concat_vectors/BUILD_VECTOR is a
> bad idea?

I think that fixing the lowering code is the right thing to do.  You can also CC Elena, who is actively working in these parts of the code. 

Thanks,
Nadav