[PATCH] [X86] Skip concat_vectors when lowering vector broadcast

Nadav Rotem nrotem at apple.com
Fri Jan 10 13:12:51 PST 2014


Hi Robert, 

Sorry for the delay in review. 

> 
> So I had a go at implementing a combine - the simple case of
> concat_vectors(BUILD_VECTOR(X, undef)) canonicalized into a larger
> BUILD_VECTOR.
> 
> This was looking good, and it fixed the initial vbroadcast issue
> without the need to change the x86 backend code.

Excellent!


>  However, one of the
> LLVM tests failed (CodeGen/X86/avx-shuffle.ll).  Analyzing this showed
> the swap8doubles test failed indirectly due to a change in the
> expected output.

Is the assembly output better now ? 

> 
> There's an x86 specific transform in PerformShuffleCombine256
> (X86ISelLowering.cpp) that recognizes a vector zero extend.  This
> involves a complex pattern of concat_vectors, BUILD_VECTOR and undef
> in operands 1 and 2 of a shuffle.  Of course, this is now
> canonicalized and the optimization no longer triggers.

We need to make sure that the lowering code can handle the new canonicalization. 

> I had a go at "fixing" the optimization to handle the canonicalized
> pattern instead.  This is fairly trivial for the second operand to the
> shuffle (the vector built from zeroes and undef).  Instead of
> concat_vectors(BUILD_VECTOR(0,0,0,0), undef)), this is now
> BUILD_VECTOR(0,0,0,0,...).  This fixes the failure in the LLVM test.
> 

> However, the problem is there is still a case which won't be
> recognized.

>  This is when V (referring to the diagram in the comments)
> is itself a BUILD_VECTOR.  In this case, the first operand to the
> shuffle will also have been canonicalized.  This is rather more
> difficult to handle as you'd need to recognize a BUILD_VECTOR in which
> the last half was undef.

Why is this difficult to handle ? We just need to scan the last elements of the build_vector. 

> 
> Before trying to handle this case I was going to come up with a
> testcase to trigger it (the hope being that with both operands to the
> shuffle canonicalized, the resultant code without triggering the
> optimization may be as good, so I don't need to handle it anyway).
> 

One way to produce test cases is to implement a verifier that will crash on our interesting case with an assert. Next, you can use llvm-stress to fuzz it until you find the random pattern that makes it crash. 

> However, this testcase is proving tricky, so I thought I'd ask for
> advice before going any further.  Does the approach still sound
> sensible?  Is "fixing" PerformShuffleCombine256 the right way, or does
> it indicate that there may be other optimizations that will break, and
> ultimately that canonicalizing the concat_vectors/BUILD_VECTOR is a
> bad idea?

I think that fixing the lowering code is the right thing to do.  You can also CC Elena, who is actively working in these parts of the code. 

Thanks,
Nadav




More information about the llvm-commits mailing list