[all-commits] [llvm/llvm-project] 9294d4: [AMDGPU][SDAG] Only fold flat offsets if they are ...

Thu Mar 27 02:03:17 PDT 2025

  Branch: refs/heads/users/ritter-x2a/03-21-_amdgpu_sdag_only_fold_flat_offsets_if_they_are_inbounds
  Home:   https://github.com/llvm/llvm-project
  Commit: 9294d4f1094afd1b955447a0e76480d83fadb3d4
      https://github.com/llvm/llvm-project/commit/9294d4f1094afd1b955447a0e76480d83fadb3d4
  Author: Fabian Ritter <fabian.ritter at amd.com>
  Date:   2025-03-27 (Thu, 27 Mar 2025)

  Changed paths:
    M llvm/include/llvm/CodeGen/SelectionDAG.h
    M llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
    M llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
    M llvm/test/CodeGen/AMDGPU/fold-gep-offset.ll
    M llvm/test/CodeGen/AMDGPU/loop-prefetch-data.ll
    M llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
    M llvm/test/Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll

  Log Message:
  -----------
  [AMDGPU][SDAG] Only fold flat offsets if they are inbounds

For flat memory instructions where the address is supplied as a base address
register with an immediate offset, the memory aperture test ignores the
immediate offset. Currently, ISel does not respect that, which leads to
miscompilations where valid input programs crash when the address computation
relies on the immediate offset to get the base address in the proper memory
aperture. Global or scratch instructions are not affected.

This patch only selects flat instructions with immediate offsets from address
computations with the inbounds flag: If the address computation does not leave
the bounds of the allocated object, it cannot leave the bounds of the memory
aperture and is therefore safe to handle with an immediate offset.

It also adds the inbounds flag to DAG nodes resulting from transformations:
- Address computations resulting from getObjectPtrOffset. As far as I can tell,
  this function is only used to compute addresses within accessed memory ranges,
  e.g., for loads and stores that are split during legalization.
- Reassociated inbounds adds. If both involved operations are inbounds, then so
  are operations after the transformation.
- Address computations in the SelectionDAG lowering of the memcpy/move/set
  intrinsics. Base and result of the address arithmetic there are accessed, so
  the operation must be inbounds.

It might make sense to separate these changes into their own PR, but I don't
see a way to test them without adding a use of the inbounds SDAG flag.

Affected tests:
- CodeGen/AMDGPU/fold-gep-offset.ll: Offsets are no longer wrongly folded,
  added new positive tests where we still do fold them.
- Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll: Offset folding doesn't
  seem integral to this test, so the test is not changed to make offset folding
  still happen.
- CodeGen/AMDGPU/loop-prefetch-data.ll: loop-reduce prefers to base addresses
  on the potentially OOB addresses used for prefetching for memory accesses,
  that might be a separate issue to look into.
- Added memset tests to CodeGen/AMDGPU/memintrinsic-unroll.ll to make sure that
  offsets in the memset DAG lowering are still folded properly.

A similar patch for GlobalISel will follow.

Fixes SWDEV-516125.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications