[all-commits] [llvm/llvm-project] 88db0e: [AMDGPU][SDAG] Only fold flat offsets if they are ...

Tue Oct 28 09:08:15 PDT 2025

  Branch: refs/heads/users/ritter-x2a/10-28-_amdgpu_sdag_only_fold_flat_offsets_if_they_are_inbounds_ptradds
  Home:   https://github.com/llvm/llvm-project
  Commit: 88db0ea7d9ea9f89c8204b059d97d7a79f213a62
      https://github.com/llvm/llvm-project/commit/88db0ea7d9ea9f89c8204b059d97d7a79f213a62
  Author: Fabian Ritter <fabian.ritter at amd.com>
  Date:   2025-10-28 (Tue, 28 Oct 2025)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
    M llvm/test/CodeGen/AMDGPU/fold-gep-offset.ll
    M llvm/test/CodeGen/AMDGPU/infer-addrspace-flat-atomic.ll
    M llvm/test/CodeGen/AMDGPU/loop-prefetch-data.ll
    M llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
    M llvm/test/CodeGen/AMDGPU/neg_ashr64_reduce.ll
    M llvm/test/CodeGen/AMDGPU/no-folding-imm-to-inst-with-fi.ll
    M llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll

  Log Message:
  -----------
  [AMDGPU][SDAG] Only fold flat offsets if they are inbounds PTRADDs

For flat memory instructions where the address is supplied as a base address
register with an immediate offset, the memory aperture test ignores the
immediate offset. Currently, SDISel does not respect that, which leads to
miscompilations where valid input programs crash when the address computation
relies on the immediate offset to get the base address in the proper memory
aperture. Global or scratch instructions are not affected.

This patch only selects flat instructions with immediate offsets from PTRADD
address computations with the inbounds flag: If the PTRADD does not leave the
bounds of the allocated object, it cannot leave the bounds of the memory
aperture and is therefore safe to handle with an immediate offset.

Affected tests:

- CodeGen/AMDGPU/fold-gep-offset.ll: Offsets are no longer wrongly folded, added
  new positive tests where we still do fold them.
- CodeGen/AMDGPU/infer-addrspace-flat-atomic.ll: Offset folding doesn't seem
  integral to this test, so the test is not changed to make offset folding still
  happen.
- CodeGen/AMDGPU/loop-prefetch-data.ll: loop-reduce transforms inbounds
  addresses for accesses to be based on potentially OOB addresses used for
  prefetching.
- I think the remaining ones suffer from the limited preservation of the
  inbounds flag in PTRADD DAGCombines due to the provenance problems pointed out
  in PR #165424 and the fact that
  `AMDGPUTargetLowering::SplitVector{Load|Store}` legalizes too-wide accesses by
  repeatedly splitting them in half.  Legalizing a V32S32 memory accesses
  therefore leads to inbounds ptradd chains like (ptradd inbounds (ptradd
  inbounds (ptradd inbounds P, 64), 32), 16). The DAGCombines fold them into a
  single ptradd, but the involved transformations generally cannot preserve the
  inbounds flag (even though it would be valid in this case).

Similar previous PR that relied on `ISD::ADD inbounds` instead of `ISD::PTRADD inbounds` (closed): #132353
Analogous PR for GISel (merged): #153001

Fixes SWDEV-516125.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications