[PATCH] D111960: [X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Oct 17 05:08:41 PDT 2021
RKSimon created this revision.
RKSimon added reviewers: craig.topper, spatel, lebedev.ri, pengfei.
Herald added subscribers: steven.zhang, hiraditya.
RKSimon requested review of this revision.
Herald added a project: LLVM.
The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances.
This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source.
There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this always creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad should have a check to see if all uses of the wider load are partial uses (EXTRACT_VECTOR_ELT/EXTRACT_SUBVECTOR/SHUFFLE_VECTOR etc) ?
Noticed while investigating the quality of interleaved load/store codegen.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D111960
Files:
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
llvm/lib/Target/X86/X86InstrSSE.td
llvm/test/CodeGen/X86/avx-vperm2x128.ll
llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
llvm/test/CodeGen/X86/pr50823.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-2.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-2.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-4.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-6.ll
llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-4.ll
llvm/test/CodeGen/X86/vector-shuffle-combining-avx.ll
llvm/test/CodeGen/X86/x86-interleaved-access.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D111960.380238.patch
Type: text/x-patch
Size: 275194 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211017/1982665b/attachment-0001.bin>
More information about the llvm-commits
mailing list