[all-commits] [llvm/llvm-project] 10439f: [X86][AVX] Add X86ISD::VALIGN target shuffle decod...

Sun Mar 29 09:00:42 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 10439f9e32edf0efd34e19f19c0d0e7555cd5492
      https://github.com/llvm/llvm-project/commit/10439f9e32edf0efd34e19f19c0d0e7555cd5492
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2020-03-29 (Sun, 29 Mar 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/vector-shuffle-128-v4.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-v1.ll

  Log Message:
  -----------
  [X86][AVX] Add X86ISD::VALIGN target shuffle decode support

Allows us to combine VALIGN instructions with other shuffles - the combiner doesn't create VALIGN yet though.

  Commit: da4c7db793aa71a1e59c31b346e975593c090232
      https://github.com/llvm/llvm-project/commit/da4c7db793aa71a1e59c31b346e975593c090232
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2020-03-29 (Sun, 29 Mar 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp

  Log Message:
  -----------
  [X86] Rename matchShuffleAsByteRotate to matchShuffleAsElementRotate. NFC.

This was an inner helper function for the real matchShuffleAsByteRotate function, but it is more generic and is used directly for VALIGN lowering which doesn't work at the byte level.

  Commit: 7734e4b3a36f233df493e6101086a9c95d309a40
      https://github.com/llvm/llvm-project/commit/7734e4b3a36f233df493e6101086a9c95d309a40
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2020-03-29 (Sun, 29 Mar 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/avx-vperm2x128.ll
    M llvm/test/CodeGen/X86/vector-reduce-mul.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx2.ll

  Log Message:
  -----------
  [X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720)

As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128, and we can use the implicit zeroing of the upper half.

I've added some extra tests to vector-shuffle-combining-avx2.ll to make sure we don't lose coverage.

Compare: https://github.com/llvm/llvm-project/compare/b632bd88a633...7734e4b3a36f