[all-commits] [llvm/llvm-project] 36e3c6: [X86][AVX] Truncate vectors with PACKSS/PACKUS on ...

Thu Mar 25 03:35:16 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 36e3c6c841eb7afa417fea4f3357c48cd1bf0583
      https://github.com/llvm/llvm-project/commit/36e3c6c841eb7afa417fea4f3357c48cd1bf0583
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2021-03-25 (Thu, 25 Mar 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/masked_store_trunc.ll
    M llvm/test/CodeGen/X86/psubus.ll
    M llvm/test/CodeGen/X86/vector-reduce-and-bool.ll
    M llvm/test/CodeGen/X86/vector-reduce-or-bool.ll
    M llvm/test/CodeGen/X86/vector-reduce-xor-bool.ll
    M llvm/test/CodeGen/X86/vector-trunc-math.ll
    M llvm/test/CodeGen/X86/vector-trunc.ll

  Log Message:
  -----------
  [X86][AVX] Truncate vectors with PACKSS/PACKUS on AVX2 targets

Until AVX512 we don't have any vector truncation instructions, and always lower using shuffles instead.

combineVectorTruncation performs this earlier than lowering as it makes it easier to use any sign/zero-extended bits in the truncated bits with PACKSS/PACKUS to perform the shuffle.

We currently don't attempt to use combineVectorTruncation on AVX2 targets as in the past 256-bit PACKSS/PACKUS tended to cause 128-bit lane shuffle regressions - but these should now be all resolved with combineHorizOpWithShuffle and in all cases we now reduce the amount of cross-lane shuffling and variable shuffle mask usage.

Differential Revision: https://reviews.llvm.org/D96609