[PATCH][x86] Add more rules for combining vselect dag nodes.

Tue Jan 28 08:13:33 PST 2014

Ping.

On Tue, Jan 21, 2014 at 10:38 PM, Andrea Di Biagio
<andrea.dibiagio at gmail.com> wrote:
> Hi,
>
> This patch adds extra rules for combining vselect dag nodes into movsd.
> This improves the fix committed at revision r199683 adding the
> following new target specific combine rules:
>
> 1)   fold (v4i32: vselect <0,0,-1,-1>, A, B) ->
>             (v4i32 (bitcast (movsd (v2i64 (bitcast A)), (v2i64 (bitcast B))) ))
>
> 2)   fold (v4f32: vselect <0,0,-1,-1>, A, B) ->
>             (v4f32 (bitcast (movsd (v2f64 (bitcast A)), (v2f64 (bitcast B))) ))
>
> 3)   fold (v4i32: vselect <-1,-1,0,0>, A, B) ->
>             (v4i32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))
>
> 4)   fold (v4f32: vselect <-1,-1,0,0>, A, B) ->
>             (v4f32 (bitcast (movsd (v2f64 (bitcast B)), (v2f64 (bitcast A))) ))
>
> Example:
>
> //////
> define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
>   %select = select <4 x i1><i1 false, i1 false, i1 true, i1 true>, <4
> x i32> %A, <4 x i32> %B
>   ret <4 x i32> %select
> }
> //////
>
> (with -mattr=sse2 and -march=x86-64)
> Before this change, the compiler produced the following assembly:
>         movaps  %xmm0, %xmm2
>         movaps  .LCPI1_0(%rip), %xmm0
>         blendvps        %xmm2, %xmm1
>         movaps  %xmm1, %xmm0
>         retq
>
> Now it produces:
>         movsd   %xmm1, %xmm0
>         retq
>
> Basically with this patch we generate a single movsd instead of a
> blendvps which requires a load from constant pool to properly set the
> selection Mask.
>
> Please let me know if ok to submit.
>
> Thanks,
> Andrea Di Biagio
> SN Systems - Sony Computer Entertainment Group.