[PATCH][AVX] Lower v4i64->v4i32 ISD::TRUNCATE for minimal shuffles
nrotem at apple.com
Wed Mar 5 09:40:17 PST 2014
Your change looks like a win. If I remember correctly Sandybridge only has a single shuffle port (port5) and reducing the pressure on port5 should be a win. You are also reducing the latency, which is great.
On Mar 5, 2014, at 7:44 AM, Cameron McInally <cameron.mcinally at nyu.edu> wrote:
> Hey guys,
> For AVX, v4i64->v4i32 truncates are currently lowered into two
> shuffles plus a movlh:
>> vpshufd $8, %xmm1, %xmm1 ## xmm1 = xmm1[0,2,0,0]
>> vpshufd $8, %xmm0, %xmm0 ## xmm0 = xmm0[0,2,0,0]
>> vmovlhps %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0,xmm1
> This could also be done using a vshufps:
>> vshufps $-120, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,2]
> Please note that this does change the execution domain of the shuffle,
> but as far as I can tell this should be okay. My understanding, from
> looking at Fog's tables, is that the shuffles should be a wash and
> avoiding the movlh is a win.
> Any insights into whether this change is a good idea or not?
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
More information about the llvm-commits