[all-commits] [llvm/llvm-project] 866271: [AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#10...

Tue Sep 3 20:15:41 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 86627149f6fd5148311b7b0aa1c7195a05a5d6a8
      https://github.com/llvm/llvm-project/commit/86627149f6fd5148311b7b0aa1c7195a05a5d6a8
  Author: Carl Ritson <carl.ritson at amd.com>
  Date:   2024-09-04 (Wed, 04 Sep 2024)

  Changed paths:
    M llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
    M llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
    M llvm/lib/Target/AMDGPU/GCNSubtarget.h
    M llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmin.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll
    M llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll
    M llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll
    M llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll
    M llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll
    M llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll
    M llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll
    M llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmin.ll
    M llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fadd.ll
    M llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmax.ll
    M llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmin.ll
    M llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fsub.ll
    M llvm/test/CodeGen/AMDGPU/flat-scratch.ll
    M llvm/test/CodeGen/AMDGPU/fmaximum.ll
    M llvm/test/CodeGen/AMDGPU/fmaximum3.ll
    M llvm/test/CodeGen/AMDGPU/fminimum.ll
    M llvm/test/CodeGen/AMDGPU/fminimum3.ll
    M llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll
    M llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmax.ll
    M llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmin.ll
    M llvm/test/CodeGen/AMDGPU/global-atomicrmw-fsub.ll
    M llvm/test/CodeGen/AMDGPU/global_atomics_i64.ll
    A llvm/test/CodeGen/AMDGPU/hazard-recognizer-src-shared-base.ll
    M llvm/test/CodeGen/AMDGPU/indirect-call-known-callees.ll
    M llvm/test/CodeGen/AMDGPU/insert_waitcnt_for_precise_memory.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.cond.sub.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.atomic.fadd.v2bf16.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fadd.v2bf16.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fadd_nortn.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fadd_rtn.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fmax.f32.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fmin.f32.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll
    M llvm/test/CodeGen/AMDGPU/llvm.maximum.f16.ll
    M llvm/test/CodeGen/AMDGPU/llvm.maximum.f32.ll
    M llvm/test/CodeGen/AMDGPU/llvm.minimum.f16.ll
    M llvm/test/CodeGen/AMDGPU/llvm.minimum.f32.ll
    M llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
    M llvm/test/CodeGen/AMDGPU/load-constant-i16.ll
    M llvm/test/CodeGen/AMDGPU/load-constant-i32.ll
    M llvm/test/CodeGen/AMDGPU/load-constant-i8.ll
    M llvm/test/CodeGen/AMDGPU/local-atomicrmw-fadd.ll
    M llvm/test/CodeGen/AMDGPU/local-atomicrmw-fmax.ll
    M llvm/test/CodeGen/AMDGPU/local-atomicrmw-fmin.ll
    M llvm/test/CodeGen/AMDGPU/local-atomicrmw-fsub.ll
    M llvm/test/CodeGen/AMDGPU/loop-prefetch-data.ll
    M llvm/test/CodeGen/AMDGPU/lower-work-group-id-intrinsics-hsa.ll
    M llvm/test/CodeGen/AMDGPU/lower-work-group-id-intrinsics-pal.ll
    M llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll
    M llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-lastuse.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-singlethread.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-wavefront.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-global-lastuse.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-global-nontemporal.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-global-singlethread.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-global-wavefront.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-invalid-syncscope.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-local-nontemporal.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-local-singlethread.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-local-wavefront.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-private-lastuse.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-private-nontemporal.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll
    M llvm/test/CodeGen/AMDGPU/pseudo-scalar-transcendental.ll
    M llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll
    M llvm/test/CodeGen/AMDGPU/v_swap_b16.ll
    M llvm/test/CodeGen/AMDGPU/valu-mask-write-hazard.mir
    A llvm/test/CodeGen/AMDGPU/valu-read-sgpr-hazard.mir
    M llvm/test/CodeGen/AMDGPU/vcmpx-permlane-hazard.mir

  Log Message:
  -----------
  [AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)

Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on affected paths.

Compute a global cache of SGPRs with any VALU reads and use this to
avoid inserting mitigation for SGPRs never accessed by VALUs.

To avoid excessive search when compile time is priority implement
secondary mode where all SALU writes are mitigated.

Co-authored-by: Shilei Tian <shilei.tian at amd.com>

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications