[all-commits] [llvm/llvm-project] c21153: [AMDGPU][SDAG] Only fold flat offsets if they are ...

Tue Mar 18 10:24:37 PDT 2025

  Branch: refs/heads/users/ritter-x2a/03-17-_amdgpu_sdag_only_fold_flat_offsets_if_they_are_inbounds
  Home:   https://github.com/llvm/llvm-project
  Commit: c21153d0c9ee9e0f23f1e1f2a4b7c5add7f09ce5
      https://github.com/llvm/llvm-project/commit/c21153d0c9ee9e0f23f1e1f2a4b7c5add7f09ce5
  Author: Fabian Ritter <fabian.ritter at amd.com>
  Date:   2025-03-18 (Tue, 18 Mar 2025)

  Changed paths:
    M llvm/include/llvm/CodeGen/SelectionDAG.h
    M llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
    M llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
    M llvm/test/CodeGen/AMDGPU/atomics_cond_sub.ll
    M llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-flat.ll
    M llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fadd.ll
    M llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmax.ll
    M llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmin.ll
    M llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fsub.ll
    M llvm/test/CodeGen/AMDGPU/flat_atomics.ll
    M llvm/test/CodeGen/AMDGPU/flat_atomics_i32_system.ll
    M llvm/test/CodeGen/AMDGPU/flat_atomics_i64.ll
    M llvm/test/CodeGen/AMDGPU/flat_atomics_i64_noprivate.ll
    M llvm/test/CodeGen/AMDGPU/flat_atomics_i64_system_noprivate.ll
    M llvm/test/CodeGen/AMDGPU/fold-gep-offset.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.dec.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.inc.ll
    M llvm/test/CodeGen/AMDGPU/loop-prefetch-data.ll
    M llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-singlethread.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-wavefront.ll
    M llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll
    M llvm/test/CodeGen/AMDGPU/offset-split-flat.ll
    M llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
    M llvm/test/Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll

  Log Message:
  -----------
  [AMDGPU][SDAG] Only fold flat offsets if they are inbounds

For flat memory instructions where the address is supplied as a base
address register with an immediate offset, the memory aperture test
ignores the immediate offset. Currently, ISel does not respect that,
which leads to miscompilations where valid input programs crash when the
address computation relies on the immediate offset to get the base
address in the proper memory aperture. Global or scratch instructions
are not affected.

This patch only selects flat instructions with immediate offsets from
address computations with the inbounds flag: If the address computation
does not leave the bounds of the allocated object, it cannot leave the
bounds of the memory aperture and is therefore safe to handle with an
immediate offset.

It also adds the inbounds flag to DAG nodes resulting from
transformations:
- Address computations resulting from getObjectPtrOffset. As far as I
  can tell, this function is only used to compute addresses within
  accessed memory ranges, e.g., for loads and stores that are split
  during legalization.
- Reassociated inbounds adds. If both involved operations are inbounds,
  then so are the operations after the transformation.
- Address computations in the SelectionDAG lowering of the
  memcpy/move/set intrinsics. Base and result of the address arithmetic
  there are accessed, so the operation must be inbounds.

It might make sense to separate these changes into their own PR, but I
don't see a way to test them without adding a use of the inbounds SDAG
flag.

Affected tests:
- CodeGen/AMDGPU/fold-gep-offset.ll: Offsets are no longer wrongly
  folded; added new positive tests where we still do fold them.
- Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll: Offset folding
  doesn't seem integral to this test, so the test input is not changed.
- CodeGen/AMDGPU/loop-prefetch-data.ll: loop-reduce prefers to base
  addresses on the potentially OOB addresses used for prefetching for
  memory accesses, that might be a separate issue to look into.
- Added memset tests to CodeGen/AMDGPU/memintrinsic-unroll.ll to make
  sure that offsets in the memset DAG lowering are still folded
  properly.
- All others: Added inbounds flags to GEPs in the input so that the
  output stays the same.

A similar patch for GlobalISel will follow.

Fixes SWDEV-516125.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications