[PATCH] D32993: DAGCombine: Extend createBuildVecShuffle for case len(in_vec) = 4*len(result_vec)

Tue May 9 08:32:15 PDT 2017

efriedma edited reviewers, added: efriedma; removed: eli.friedman.
efriedma added inline comments.

================
Comment at: test/CodeGen/ARM/vpadd.ll:376
 ; CHECK-NEXT:    vmovl.u8 q8, d16
-; CHECK-NEXT:    vpadd.i16 d16, d16, d17
+; CHECK-NEXT:    vuzp.16 q8, q9
+; CHECK-NEXT:    vadd.i16 d16, d16, d18
----------------
zvi wrote:
> This appears to be a regression for ARM codegen. Assuming it is, what the options for fixing it? IMHO these are the options ordered by preference:
> 1. Can we improve the ARM backend to handle this case?
> 2. Add a TLI hook for deciding when insert-extract sequences are better than composed shuffle?
> 3. Do this only in the X86 lowering.
We have a combine in the ARM backend which specifically combines vuzp+vadd to vpadd.  It looks like the reason it isn't triggering here is that we're doing the vuzp in the wrong width; probably easy to fix.

https://reviews.llvm.org/D32993