[PATCH][x86] Teach how to combine a vselect into a movss/movsd.
Nadav Rotem
nrotem at apple.com
Fri Jan 17 12:16:26 PST 2014
Thanks for working on this Andrea. The transformation itself is okay, but I am worried about problems that may show up if this optimization were to fire up too early before other optimizations have a chance to optimize this select. This is really a lowering transformation I mention this because very few optimizations can (or should have to) optimize x86 specific nodes. For example, maybe A and B could be optimized into constants at some point but this optimization would prevent us from doing anything about it. I suggest that you make sure that this optimization only runs after the operations are legalized.
On Jan 16, 2014, at 5:42 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> Hi,
>
> this patch teaches the x86 backend how to combine vselect dag nodes
> into movss/movsd when possible.
>
> If the vector type of the operands of the vselect is either
> MVT::v4i32 or MVT::v4f32, then we can fold according to the following rules:
>
> 1. fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B);
> 2. fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A)
>
> If the vector type of the operands of the vselect is either
> MVT::v2i64 or MVT::v2f64 (and we have SSE2) , then we can fold
> according to the following rules:
>
> 3. fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B)
> 4. fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A)
>
> I added extra test cases to file 'test/CodeGen/X86/vselect.ll' in
> order to verify that we correctly select movss/movsd instructions.
>
> Before this change, the backend only knew how to lower a shufflevector
> into a X86Movss/X86Movsd, but not how to do the same with vselect dag
> nodes.
> For that reason, all the ISel patterns introduced at r197145
> http://llvm.org/viewvc/llvm-project?view=revision&revision=197145
> were only matched if the X86Movss/X86Movsd were obtained from the
> custom lowering of a shufflevector.
>
> With this change, the backend is now able to combine vselect into
> X86Movss and therefore it can reuse the patterns from revision 197145
> to further simplify packed vector arithmetic operations.
>
> I added new test-cases in 'test/CodeGen/X86/sse-scalar-fp-arith-2.ll'
> to verify that now we correctly select SSE/AVX scalar fp instructions
> from a packed arithmetic instruction followed by a vselect.
>
> After this change, the following tests started failing because they
> always expected blendvps/blendvpd instructions in the output assembly:
> test/CodeGen/X86/sse2-blend.ll
> test/CodeGen/X86/avx-blend.ll
> test/CodeGen/X86/blend-msb.ll
> test/CodeGen/X86/sse41-blend.ll
>
> Now the backend knows how to efficiently emit movss/movsd and
> therefore all the failing cases are expected failures (that is because
> the backend knows how to select movss/movsd and not only
> blendvps/blendvpd).
>
> I modified those failing tests so that - when possible - the generated
> assembly still contains the expected blendvps/blendvpd(see for example
> how I changed avx-blend.ll).
> In all other cases I just changed the CHECK lines to verify that we
> produce a movss/movsd.
>
> Please let me know if ok to submit.
>
> Thanks,
> Andrea Di Biagio
> SN Systems - Sony Computer Entertainment Group.
> <patch-vselect.diff>
More information about the llvm-commits
mailing list