[all-commits] [llvm/llvm-project] a1f5fe: [NVPTX] Optimize v2x16 BUILD_VECTORs to PRMT (#116...

Tue Dec 17 02:22:41 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: a1f5fe8c851ba6a0070e4cab9e7436e962677ac6
      https://github.com/llvm/llvm-project/commit/a1f5fe8c851ba6a0070e4cab9e7436e962677ac6
  Author: Fraser Cormack <fraser at codeplay.com>
  Date:   2024-12-17 (Tue, 17 Dec 2024)

  Changed paths:
    M llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
    M llvm/test/CodeGen/NVPTX/bf16-instructions.ll
    M llvm/test/CodeGen/NVPTX/fma-relu-contract.ll
    M llvm/test/CodeGen/NVPTX/fma-relu-fma-intrinsic.ll
    M llvm/test/CodeGen/NVPTX/fma-relu-instruction-flag.ll
    M llvm/test/CodeGen/NVPTX/i16x2-instructions.ll

  Log Message:
  -----------
  [NVPTX] Optimize v2x16 BUILD_VECTORs to PRMT (#116675)

When two 16-bit values are combined into a v2x16 vector, and those
values are truncated come from 32-bit values, a PRMT instruction can
save registers by selecting bytes directly from the original 32-bit
values. We do this during a post-legalize DAG combine, as these
opportunities are typically only exposed after the BUILD_VECTOR's
operands have been legalized.

Additionally, if the 32-bit values are right-shifted, we can fold in the
shift by selecting higher bytes with PRMT. Only logical right-shifts by
16 are supported (for now) since those are the only situations seen in
practice. Right shifts by 16 often come up during the legalization of
EXTRACT_VECTOR_ELT.

This idea was brought up in a PR comment by @Artem-B.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications