[all-commits] [llvm/llvm-project] fa038e: [x86] favor vector constant load to avoid GPR to X...

Mon May 25 05:07:27 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: fa038e03504c7d0dfd438b1dfdd6da7081e75617
      https://github.com/llvm/llvm-project/commit/fa038e03504c7d0dfd438b1dfdd6da7081e75617
  Author: Sanjay Patel <spatel at rotateright.com>
  Date:   2020-05-25 (Mon, 25 May 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/avx-load-store.ll
    M llvm/test/CodeGen/X86/avx2-arith.ll
    M llvm/test/CodeGen/X86/combine-udiv.ll
    M llvm/test/CodeGen/X86/fcmp-constant.ll
    M llvm/test/CodeGen/X86/insert-into-constant-vector.ll
    M llvm/test/CodeGen/X86/packss.ll
    M llvm/test/CodeGen/X86/pshufb-mask-comments.ll
    M llvm/test/CodeGen/X86/ret-mmx.ll
    M llvm/test/CodeGen/X86/sad.ll
    M llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll
    M llvm/test/CodeGen/X86/vec_set-A.ll
    M llvm/test/CodeGen/X86/vec_shift2.ll
    M llvm/test/CodeGen/X86/vector-lzcnt-128.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v16.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v32.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-512-v32.ll
    M llvm/test/CodeGen/X86/vector-shuffle-512-v64.ll
    M llvm/test/CodeGen/X86/vector-shuffle-512-v8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx512f.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-xop.ll
    M llvm/test/CodeGen/X86/vector-shuffle-v1.ll
    M llvm/test/CodeGen/X86/vector-tzcnt-128.ll

  Log Message:
  -----------
  [x86] favor vector constant load to avoid GPR to XMM transfer, part 2

This replaces the build_vector lowering code that was just added in
D80013
and matches the pattern later from the x86-specific "vzext_movl".
That seems to result in the same or better improvements and gets rid
of the 'TODO' items from that patch.

AFAICT, we always shrink wider constant vectors to 128-bit on these
patterns, so we still get the implicit zero-extension to ymm/zmm
without wasting space on larger vector constants. There's a trade-off
there because that means we miss potential load-folding.

Similarly, we could load scalar constants here with implicit
zero-extension even to 128-bit. That saves constant space, but it
means we forego load-folding, and so it increases register pressure.
This seems like a good middle-ground between those 2 options.

Differential Revision: https://reviews.llvm.org/D80131