[PATCH][x86] Teach how to combine a vselect into a movss/movsd.

Fri Jan 17 12:16:26 PST 2014

Thanks for working on this Andrea. The transformation itself is okay, but I am worried about problems that may show up if this optimization were to fire up too early before other optimizations have a chance to optimize this select. This is really a lowering transformation I mention this because very few optimizations can (or should have to) optimize x86 specific nodes. For example, maybe A and B could be optimized into constants at some point but this optimization would prevent us from doing anything about it.  I suggest that you make sure that this optimization only runs after the operations are legalized. 

On Jan 16, 2014, at 5:42 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:

> Hi,
> 
> this patch teaches the x86 backend how to combine vselect dag nodes
> into movss/movsd when possible.
> 
> If the vector type of the operands of the vselect is either
> MVT::v4i32 or MVT::v4f32, then we can fold according to the following rules:
> 
> 1.  fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B);
> 2.  fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A)
> 
> If the vector type of the operands of the vselect is either
> MVT::v2i64 or MVT::v2f64 (and we have SSE2) , then we can fold
> according to the following rules:
> 
>  3.  fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B)
>  4.  fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A)
> 
> I added extra test cases to file 'test/CodeGen/X86/vselect.ll' in
> order to verify that we correctly select movss/movsd instructions.
> 
> Before this change, the backend only knew how to lower a shufflevector
> into a X86Movss/X86Movsd, but not how to do the same with vselect dag
> nodes.
> For that reason, all the ISel patterns introduced at r197145
> http://llvm.org/viewvc/llvm-project?view=revision&revision=197145
> were only matched if the X86Movss/X86Movsd were obtained from the
> custom lowering of a shufflevector.
> 
> With this change, the backend is now able to combine vselect into
> X86Movss and therefore it can reuse the patterns from revision 197145
> to further simplify packed vector arithmetic operations.
> 
> I added new test-cases in 'test/CodeGen/X86/sse-scalar-fp-arith-2.ll'
> to verify that now we correctly select SSE/AVX scalar fp instructions
> from a packed arithmetic instruction followed by a vselect.
> 
> After this change, the following tests started failing because they
> always expected blendvps/blendvpd instructions in the output assembly:
>  test/CodeGen/X86/sse2-blend.ll
>  test/CodeGen/X86/avx-blend.ll
>  test/CodeGen/X86/blend-msb.ll
>  test/CodeGen/X86/sse41-blend.ll
> 
> Now the backend knows how to efficiently emit movss/movsd and
> therefore all the failing cases are expected failures (that is because
> the backend knows how to select movss/movsd and not only
> blendvps/blendvpd).
> 
> I modified those failing tests so that - when possible - the generated
> assembly still contains the expected blendvps/blendvpd(see for example
> how I changed avx-blend.ll).
> In all other cases I just changed the CHECK lines to verify that we
> produce a movss/movsd.
> 
> Please let me know if ok to submit.
> 
> Thanks,
> Andrea Di Biagio
> SN Systems - Sony Computer Entertainment Group.
> <patch-vselect.diff>