[PATCH][AVX] Lower v4i64->v4i32 ISD::TRUNCATE for minimal shuffles

Wed Mar 5 11:50:42 PST 2014

Thanks, Nadav and Andrea. Committed as r202996 with Andrea's suggestions.

On Wed, Mar 5, 2014 at 12:40 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi Cameron,
>
> Your change looks like a win. If I remember correctly Sandybridge only has  a single shuffle port (port5) and reducing the pressure on port5 should be a win. You are also reducing the latency, which is great.
>
> Thanks,
> Nadav
>
> On Mar 5, 2014, at 7:44 AM, Cameron McInally <cameron.mcinally at nyu.edu> wrote:
>
>> Hey guys,
>>
>> For AVX, v4i64->v4i32 truncates are currently lowered into two
>> shuffles plus a movlh:
>>
>>> vpshufd $8, %xmm1, %xmm1        ## xmm1 = xmm1[0,2,0,0]
>>> vpshufd $8, %xmm0, %xmm0        ## xmm0 = xmm0[0,2,0,0]
>>> vmovlhps %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm1[0]
>>
>> This could also be done using a vshufps:
>>
>>> vshufps $-120, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,2]
>>
>> Please note that this does change the execution domain of the shuffle,
>> but as far as I can tell this should be okay. My understanding, from
>> looking at Fog's tables, is that the shuffles should be a wash and
>> avoiding the movlh is a win.
>>
>> Any insights into whether this change is a good idea or not?
>>
>> Tia,
>> Cameron
>> <patch.diff>_______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>