[PATCH][x86] Add more rules for combining vselect dag nodes.
Andrea Di Biagio
andrea.dibiagio at gmail.com
Tue Jan 28 08:13:33 PST 2014
Ping.
On Tue, Jan 21, 2014 at 10:38 PM, Andrea Di Biagio
<andrea.dibiagio at gmail.com> wrote:
> Hi,
>
> This patch adds extra rules for combining vselect dag nodes into movsd.
> This improves the fix committed at revision r199683 adding the
> following new target specific combine rules:
>
> 1) fold (v4i32: vselect <0,0,-1,-1>, A, B) ->
> (v4i32 (bitcast (movsd (v2i64 (bitcast A)), (v2i64 (bitcast B))) ))
>
> 2) fold (v4f32: vselect <0,0,-1,-1>, A, B) ->
> (v4f32 (bitcast (movsd (v2f64 (bitcast A)), (v2f64 (bitcast B))) ))
>
> 3) fold (v4i32: vselect <-1,-1,0,0>, A, B) ->
> (v4i32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))
>
> 4) fold (v4f32: vselect <-1,-1,0,0>, A, B) ->
> (v4f32 (bitcast (movsd (v2f64 (bitcast B)), (v2f64 (bitcast A))) ))
>
> Example:
>
> //////
> define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
> %select = select <4 x i1><i1 false, i1 false, i1 true, i1 true>, <4
> x i32> %A, <4 x i32> %B
> ret <4 x i32> %select
> }
> //////
>
> (with -mattr=sse2 and -march=x86-64)
> Before this change, the compiler produced the following assembly:
> movaps %xmm0, %xmm2
> movaps .LCPI1_0(%rip), %xmm0
> blendvps %xmm2, %xmm1
> movaps %xmm1, %xmm0
> retq
>
> Now it produces:
> movsd %xmm1, %xmm0
> retq
>
> Basically with this patch we generate a single movsd instead of a
> blendvps which requires a load from constant pool to properly set the
> selection Mask.
>
> Please let me know if ok to submit.
>
> Thanks,
> Andrea Di Biagio
> SN Systems - Sony Computer Entertainment Group.
More information about the llvm-commits
mailing list