[PATCH] D32993: DAGCombine: Extend createBuildVecShuffle for case len(in_vec) = 4*len(result_vec)
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue May 9 08:32:15 PDT 2017
efriedma edited reviewers, added: efriedma; removed: eli.friedman.
efriedma added inline comments.
================
Comment at: test/CodeGen/ARM/vpadd.ll:376
; CHECK-NEXT: vmovl.u8 q8, d16
-; CHECK-NEXT: vpadd.i16 d16, d16, d17
+; CHECK-NEXT: vuzp.16 q8, q9
+; CHECK-NEXT: vadd.i16 d16, d16, d18
----------------
zvi wrote:
> This appears to be a regression for ARM codegen. Assuming it is, what the options for fixing it? IMHO these are the options ordered by preference:
> 1. Can we improve the ARM backend to handle this case?
> 2. Add a TLI hook for deciding when insert-extract sequences are better than composed shuffle?
> 3. Do this only in the X86 lowering.
We have a combine in the ARM backend which specifically combines vuzp+vadd to vpadd. It looks like the reason it isn't triggering here is that we're doing the vuzp in the wrong width; probably easy to fix.
https://reviews.llvm.org/D32993
More information about the llvm-commits
mailing list