[PATCH] D53784: [DAGCombiner] narrow vector binops when extraction is cheap

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 29 07:04:55 PDT 2018


spatel added inline comments.


================
Comment at: test/CodeGen/X86/vector-reduce-xor.ll:236
 ; AVX1-NEXT:    vpermilps {{.*#+}} xmm1 = xmm0[2,3,0,1]
 ; AVX1-NEXT:    vxorps %ymm1, %ymm0, %ymm0
 ; AVX1-NEXT:    vpermilps {{.*#+}} xmm1 = xmm0[1,1,2,3]
----------------
spatel wrote:
> RKSimon wrote:
> > There's still a lot of logic ops happening at full width - is this to do with the bitcasts?
> Yes:
>   t5: v8i32 = xor t2, t30
>                 t24: v4i64 = bitcast t5
>               t34: v2i64 = extract_subvector t24, Constant:i64<0>
> 
> ...so we should pick that up in the follow-up, or if we want to go big, I can add those diffs to this patch.
Looking a bit closer here, the bitcast enhancement is probably not enough. If there are multiple uses of a binop, we have to be careful, or we'll break a single wide op into multiple narrow ops.

There are still several patches to go before we get optimal codegen for reductions, so I'd rather not risk it by adding more patterns to this basic patch.


https://reviews.llvm.org/D53784





More information about the llvm-commits mailing list