[PATCH][x86] Add more rules for combining vselect dag nodes.

Tue Jan 28 09:31:58 PST 2014

LGTM! 

On Jan 28, 2014, at 8:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:

> Ping.
> 
> On Tue, Jan 21, 2014 at 10:38 PM, Andrea Di Biagio
> <andrea.dibiagio at gmail.com> wrote:
>> Hi,
>> 
>> This patch adds extra rules for combining vselect dag nodes into movsd.
>> This improves the fix committed at revision r199683 adding the
>> following new target specific combine rules:
>> 
>> 1)   fold (v4i32: vselect <0,0,-1,-1>, A, B) ->
>>            (v4i32 (bitcast (movsd (v2i64 (bitcast A)), (v2i64 (bitcast B))) ))
>> 
>> 2)   fold (v4f32: vselect <0,0,-1,-1>, A, B) ->
>>            (v4f32 (bitcast (movsd (v2f64 (bitcast A)), (v2f64 (bitcast B))) ))
>> 
>> 3)   fold (v4i32: vselect <-1,-1,0,0>, A, B) ->
>>            (v4i32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))
>> 
>> 4)   fold (v4f32: vselect <-1,-1,0,0>, A, B) ->
>>            (v4f32 (bitcast (movsd (v2f64 (bitcast B)), (v2f64 (bitcast A))) ))
>> 
>> Example:
>> 
>> //////
>> define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
>>  %select = select <4 x i1><i1 false, i1 false, i1 true, i1 true>, <4
>> x i32> %A, <4 x i32> %B
>>  ret <4 x i32> %select
>> }
>> //////
>> 
>> (with -mattr=sse2 and -march=x86-64)
>> Before this change, the compiler produced the following assembly:
>>        movaps  %xmm0, %xmm2
>>        movaps  .LCPI1_0(%rip), %xmm0
>>        blendvps        %xmm2, %xmm1
>>        movaps  %xmm1, %xmm0
>>        retq
>> 
>> Now it produces:
>>        movsd   %xmm1, %xmm0
>>        retq
>> 
>> Basically with this patch we generate a single movsd instead of a
>> blendvps which requires a load from constant pool to properly set the
>> selection Mask.
>> 
>> Please let me know if ok to submit.
>> 
>> Thanks,
>> Andrea Di Biagio
>> SN Systems - Sony Computer Entertainment Group.