[PATCH] D111960: [X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations

Sun Oct 17 05:08:41 PDT 2021

RKSimon created this revision.
RKSimon added reviewers: craig.topper, spatel, lebedev.ri, pengfei.
Herald added subscribers: steven.zhang, hiraditya.
RKSimon requested review of this revision.
Herald added a project: LLVM.

The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances.

This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source.

There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this always creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad should have a check to see if all uses of the wider load are partial uses (EXTRACT_VECTOR_ELT/EXTRACT_SUBVECTOR/SHUFFLE_VECTOR etc) ?

Noticed while investigating the quality of interleaved load/store codegen.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D111960

Files:
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
  llvm/lib/Target/X86/X86InstrSSE.td
  llvm/test/CodeGen/X86/avx-vperm2x128.ll
  llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
  llvm/test/CodeGen/X86/pr50823.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-2.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-2.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-4.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-6.ll
  llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-4.ll
  llvm/test/CodeGen/X86/vector-shuffle-combining-avx.ll
  llvm/test/CodeGen/X86/x86-interleaved-access.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D111960.380238.patch
Type: text/x-patch
Size: 275194 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211017/1982665b/attachment-0001.bin>