[all-commits] [llvm/llvm-project] 88db0e: [AMDGPU][SDAG] Only fold flat offsets if they are ...
Fabian Ritter via All-commits
all-commits at lists.llvm.org
Tue Oct 28 09:08:15 PDT 2025
Branch: refs/heads/users/ritter-x2a/10-28-_amdgpu_sdag_only_fold_flat_offsets_if_they_are_inbounds_ptradds
Home: https://github.com/llvm/llvm-project
Commit: 88db0ea7d9ea9f89c8204b059d97d7a79f213a62
https://github.com/llvm/llvm-project/commit/88db0ea7d9ea9f89c8204b059d97d7a79f213a62
Author: Fabian Ritter <fabian.ritter at amd.com>
Date: 2025-10-28 (Tue, 28 Oct 2025)
Changed paths:
M llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
M llvm/test/CodeGen/AMDGPU/fold-gep-offset.ll
M llvm/test/CodeGen/AMDGPU/infer-addrspace-flat-atomic.ll
M llvm/test/CodeGen/AMDGPU/loop-prefetch-data.ll
M llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
M llvm/test/CodeGen/AMDGPU/neg_ashr64_reduce.ll
M llvm/test/CodeGen/AMDGPU/no-folding-imm-to-inst-with-fi.ll
M llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll
Log Message:
-----------
[AMDGPU][SDAG] Only fold flat offsets if they are inbounds PTRADDs
For flat memory instructions where the address is supplied as a base address
register with an immediate offset, the memory aperture test ignores the
immediate offset. Currently, SDISel does not respect that, which leads to
miscompilations where valid input programs crash when the address computation
relies on the immediate offset to get the base address in the proper memory
aperture. Global or scratch instructions are not affected.
This patch only selects flat instructions with immediate offsets from PTRADD
address computations with the inbounds flag: If the PTRADD does not leave the
bounds of the allocated object, it cannot leave the bounds of the memory
aperture and is therefore safe to handle with an immediate offset.
Affected tests:
- CodeGen/AMDGPU/fold-gep-offset.ll: Offsets are no longer wrongly folded, added
new positive tests where we still do fold them.
- CodeGen/AMDGPU/infer-addrspace-flat-atomic.ll: Offset folding doesn't seem
integral to this test, so the test is not changed to make offset folding still
happen.
- CodeGen/AMDGPU/loop-prefetch-data.ll: loop-reduce transforms inbounds
addresses for accesses to be based on potentially OOB addresses used for
prefetching.
- I think the remaining ones suffer from the limited preservation of the
inbounds flag in PTRADD DAGCombines due to the provenance problems pointed out
in PR #165424 and the fact that
`AMDGPUTargetLowering::SplitVector{Load|Store}` legalizes too-wide accesses by
repeatedly splitting them in half. Legalizing a V32S32 memory accesses
therefore leads to inbounds ptradd chains like (ptradd inbounds (ptradd
inbounds (ptradd inbounds P, 64), 32), 16). The DAGCombines fold them into a
single ptradd, but the involved transformations generally cannot preserve the
inbounds flag (even though it would be valid in this case).
Similar previous PR that relied on `ISD::ADD inbounds` instead of `ISD::PTRADD inbounds` (closed): #132353
Analogous PR for GISel (merged): #153001
Fixes SWDEV-516125.
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list