[all-commits] [llvm/llvm-project] 0fa258: [X86] Implement certain 16-bit vector shifts via 3...

Thu Sep 19 11:02:21 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 0fa258c8d93c2f8de66518868a8e2a645b90afbe
      https://github.com/llvm/llvm-project/commit/0fa258c8d93c2f8de66518868a8e2a645b90afbe
  Author: David Majnemer <david.majnemer at gmail.com>
  Date:   2024-09-19 (Thu, 19 Sep 2024)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/vector-shift-ashr-128.ll
    M llvm/test/CodeGen/X86/vector-shift-ashr-256.ll
    M llvm/test/CodeGen/X86/vector-shift-ashr-512.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-128.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-256.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-512.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-128.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-256.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-512.ll

  Log Message:
  -----------
  [X86] Implement certain 16-bit vector shifts via 32-bit shifts

x86 vector ISAs are non-orthogonal in a number of ways. For example,
AVX2 has vpsravd but it does not have vpsravw. However, we can simulate
it via vpsrlvd and some SWAR-style masking.

Another example is 8-bit shifts: we can use vpsllvd to simulate the
missing "vpsllvb" if shift amounts can be shared for a single lane.

Existing code generation would use a variety of techniques including
vpmulhuw which is higher latency and often has more rigid port
requirements than simple bitwise operations.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications