[all-commits] [llvm/llvm-project] 1211d9: [X86] Use SWAR techniques for some vector i8 shifts

Wed Sep 11 22:03:31 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 1211d97922d62470ac8bc658f7bfe57e8b46a107
      https://github.com/llvm/llvm-project/commit/1211d97922d62470ac8bc658f7bfe57e8b46a107
  Author: David Majnemer <david.majnemer at gmail.com>
  Date:   2024-09-12 (Thu, 12 Sep 2024)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/vector-shift-lshr-128.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-256.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-512.ll

  Log Message:
  -----------
  [X86] Use SWAR techniques for some vector i8 shifts

SSE & AVX do not include instructions for shifting i8 vectors. Instead,
they must be synthesized via other instructions.

If pairs of i8 vectors share a shift amount, we can use SWAR techniques
to substantially reduce the amount of code generated.

Say we were going to execute this shift right:
  x >> {0, 0, 0, 0, 4, 4, 4, 4, 0, 0, 0, 0, ...}

LLVM would previously generate:
        vpxor   %xmm1, %xmm1, %xmm1
        vpunpckhbw      %ymm0, %ymm1, %ymm2
        vpunpckhbw      %ymm1, %ymm0, %ymm3
        vpsllw  $4, %ymm3, %ymm3
        vpblendd        $204, %ymm3, %ymm2, %ymm2
        vpsrlw  $8, %ymm2, %ymm2
        vpunpcklbw      %ymm0, %ymm1, %ymm3
        vpunpcklbw      %ymm1, %ymm0, %ymm0
        vpsllw  $4, %ymm0, %ymm0
        vpblendd        $204, %ymm0, %ymm3, %ymm0
        vpsrlw  $8, %ymm0, %ymm0
        vpackuswb       %ymm2, %ymm0, %ymm0

Instead, we can reinterpret a pair of i8 elements as an i16 and shift
use the same shift amount. The only thing we need to do is mask out any
bits which crossed the boundary from the top i8 to the bottom i8.

This SWAR-style technique achieves:
        vpsrlw  $4, %ymm0, %ymm1
        vpblendd        $170, %ymm1, %ymm0, %ymm0
        vpand   .LCPI0_0(%rip), %ymm0, %ymm0

This is implemented for both left and right logical shift operations.
Arithmetic shifts are less well behaved here because the shift cannot
also perform the sign extension for the lower 8 bits.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications