[PATCH][x86] Teach how to combine a vselect into a movss/movsd.

Fri Jan 17 13:19:44 PST 2014

Hi Nadav and Juergen,

On Fri, Jan 17, 2014 at 8:16 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Thanks for working on this Andrea. The transformation itself is okay, but I am worried about problems that may show up if this optimization were to fire up too early before other optimizations have a chance to optimize this select. This is really a lowering transformation I mention this because very few optimizations can (or should have to) optimize x86 specific nodes. For example, maybe A and B could be optimized into constants at some point but this optimization would prevent us from doing anything about it.  I suggest that you make sure that this optimization only runs after the operations are legalized.

True, it is safer to run this after nodes are legalized.

I'll change the patch so that the optimization runs after legalization.
(I will also introduce the std::swap as suggested by Juergen).

Thanks for the reviews!
Andrea

>
> On Jan 16, 2014, at 5:42 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>
>> Hi,
>>
>> this patch teaches the x86 backend how to combine vselect dag nodes
>> into movss/movsd when possible.
>>
>> If the vector type of the operands of the vselect is either
>> MVT::v4i32 or MVT::v4f32, then we can fold according to the following rules:
>>
>> 1.  fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B);
>> 2.  fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A)
>>
>> If the vector type of the operands of the vselect is either
>> MVT::v2i64 or MVT::v2f64 (and we have SSE2) , then we can fold
>> according to the following rules:
>>
>>  3.  fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B)
>>  4.  fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A)
>>
>> I added extra test cases to file 'test/CodeGen/X86/vselect.ll' in
>> order to verify that we correctly select movss/movsd instructions.
>>
>> Before this change, the backend only knew how to lower a shufflevector
>> into a X86Movss/X86Movsd, but not how to do the same with vselect dag
>> nodes.
>> For that reason, all the ISel patterns introduced at r197145
>> http://llvm.org/viewvc/llvm-project?view=revision&revision=197145
>> were only matched if the X86Movss/X86Movsd were obtained from the
>> custom lowering of a shufflevector.
>>
>> With this change, the backend is now able to combine vselect into
>> X86Movss and therefore it can reuse the patterns from revision 197145
>> to further simplify packed vector arithmetic operations.
>>
>> I added new test-cases in 'test/CodeGen/X86/sse-scalar-fp-arith-2.ll'
>> to verify that now we correctly select SSE/AVX scalar fp instructions
>> from a packed arithmetic instruction followed by a vselect.
>>
>> After this change, the following tests started failing because they
>> always expected blendvps/blendvpd instructions in the output assembly:
>>  test/CodeGen/X86/sse2-blend.ll
>>  test/CodeGen/X86/avx-blend.ll
>>  test/CodeGen/X86/blend-msb.ll
>>  test/CodeGen/X86/sse41-blend.ll
>>
>> Now the backend knows how to efficiently emit movss/movsd and
>> therefore all the failing cases are expected failures (that is because
>> the backend knows how to select movss/movsd and not only
>> blendvps/blendvpd).
>>
>> I modified those failing tests so that - when possible - the generated
>> assembly still contains the expected blendvps/blendvpd(see for example
>> how I changed avx-blend.ll).
>> In all other cases I just changed the CHECK lines to verify that we
>> produce a movss/movsd.
>>
>> Please let me know if ok to submit.
>>
>> Thanks,
>> Andrea Di Biagio
>> SN Systems - Sony Computer Entertainment Group.
>> <patch-vselect.diff>
>