[all-commits] [llvm/llvm-project] fd485d: [X86][AVX] Prefer VINSERTF128 over VPERM2F128 for ...

Simon Pilgrim via All-commits all-commits at lists.llvm.org
Mon Nov 1 03:51:15 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: fd485d8cda8d5d432f56d4186adcda305f46d6e2
      https://github.com/llvm/llvm-project/commit/fd485d8cda8d5d432f56d4186adcda305f46d6e2
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2021-11-01 (Mon, 01 Nov 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
    M llvm/lib/Target/X86/X86InstrSSE.td
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-2.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-2.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-4.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-4.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx.ll
    M llvm/test/CodeGen/X86/x86-interleaved-access.ll

  Log Message:
  -----------
  [X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations

The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances.

This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source.

There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this often creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad's call to X86TargetLowering::shouldReduceLoadWidth needs to be altered to check for more partial uses?

Noticed while investigating the quality of interleaved load/store codegen.

Differential Revision: https://reviews.llvm.org/D111960




More information about the All-commits mailing list