[all-commits] [llvm/llvm-project] 35dc91: [X86][SSE] lowerShuffleAsDecomposedShuffleBlend - ...

Sat Sep 12 05:41:06 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 35dc91aee2013ce1a57dfee965fa5fdee1987ee0
      https://github.com/llvm/llvm-project/commit/35dc91aee2013ce1a57dfee965fa5fdee1987ee0
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2020-09-12 (Sat, 12 Sep 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/vector-shuffle-128-v16.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v16.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v32.ll
    M llvm/test/CodeGen/X86/vector-shuffle-512-v32.ll

  Log Message:
  -----------
  [X86][SSE] lowerShuffleAsDecomposedShuffleBlend - support decomposed unpacks for some vXi8/vXi16 cases

Follow up to D86429 to handle the remaining regressions.

This patch generalizes lowerShuffleAsDecomposedShuffleBlend to lowerShuffleAsDecomposedShuffleMerge, and attempts to use an UNPCKL shuffle mask instead of a blend for the cases where the inputs are coming from alternating vXi8/vXi16 sources. Technically they don't have to be alternating (just as long as they can fit into a lower lane half for the unpack) but I didn't find as many general cases and it needed a lot more of the function to be altered.

For vXi32/vXi64 cases this could still be beneficial but in most cases the existing permute+blend approach was better.

Differential Revision: https://reviews.llvm.org/D87405