[all-commits] [llvm/llvm-project] 9d7e1d: [AMDGPU][True16] added Pre-RA hint to improve copy...

Wed Mar 12 13:13:19 PDT 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 9d7e1d92dbef5aa4d11deed360685a86935953d2
      https://github.com/llvm/llvm-project/commit/9d7e1d92dbef5aa4d11deed360685a86935953d2
  Author: Brox Chen <guochen2 at amd.com>
  Date:   2025-03-12 (Wed, 12 Mar 2025)

  Changed paths:
    M llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
    M llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
    M llvm/lib/Target/AMDGPU/SIRegisterInfo.h
    M llvm/test/CodeGen/AMDGPU/bf16.ll
    M llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
    M llvm/test/CodeGen/AMDGPU/fadd.f16.ll
    M llvm/test/CodeGen/AMDGPU/fcmp.f16.ll
    M llvm/test/CodeGen/AMDGPU/fma.f16.ll
    M llvm/test/CodeGen/AMDGPU/fmul.f16.ll
    M llvm/test/CodeGen/AMDGPU/fpext.f16.ll
    M llvm/test/CodeGen/AMDGPU/fptosi.f16.ll
    M llvm/test/CodeGen/AMDGPU/fptoui.f16.ll
    M llvm/test/CodeGen/AMDGPU/fshr.ll
    M llvm/test/CodeGen/AMDGPU/i1-to-bf16.ll
    M llvm/test/CodeGen/AMDGPU/llvm.ldexp.ll
    M llvm/test/CodeGen/AMDGPU/llvm.maxnum.f16.ll
    M llvm/test/CodeGen/AMDGPU/mad-mix.ll
    M llvm/test/CodeGen/AMDGPU/mad.u16.ll
    M llvm/test/CodeGen/AMDGPU/minimummaximum.ll
    M llvm/test/CodeGen/AMDGPU/minmax.ll
    M llvm/test/CodeGen/AMDGPU/mul.i16.ll
    M llvm/test/CodeGen/AMDGPU/preserve-hi16.ll
    M llvm/test/CodeGen/AMDGPU/saddsat.ll
    M llvm/test/CodeGen/AMDGPU/ssubsat.ll
    M llvm/test/CodeGen/AMDGPU/uaddsat.ll
    M llvm/test/CodeGen/AMDGPU/usubsat.ll

  Log Message:
  -----------
  [AMDGPU][True16] added Pre-RA hint to improve copy elimination (#103366)

The allocation order of 16 bit registers is vgpr0lo16, vgpr0hi16,
vgpr1lo16, vgpr1hi16, vgpr2lo16.... We prefer (essentially require) that
allocation order, because it uses the minimum number of registers. But
when you have 16 bit data passing between 16 and 32 bit instructions you
get lots of COPY.

This patch teach the compiler that a COPY of a 16-bit value from a 32
bit register to a lo-half 16 bit register is free, to a hi-half 16 bit
register is not.

This might get improved to coalescing with additional cases, and perhaps
as an alternative to the RA hints. For now upstreaming this solution
first.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications