[all-commits] [llvm/llvm-project] 9294d4: [AMDGPU][SDAG] Only fold flat offsets if they are ...
Fabian Ritter via All-commits
all-commits at lists.llvm.org
Thu Mar 27 02:03:17 PDT 2025
Branch: refs/heads/users/ritter-x2a/03-21-_amdgpu_sdag_only_fold_flat_offsets_if_they_are_inbounds
Home: https://github.com/llvm/llvm-project
Commit: 9294d4f1094afd1b955447a0e76480d83fadb3d4
https://github.com/llvm/llvm-project/commit/9294d4f1094afd1b955447a0e76480d83fadb3d4
Author: Fabian Ritter <fabian.ritter at amd.com>
Date: 2025-03-27 (Thu, 27 Mar 2025)
Changed paths:
M llvm/include/llvm/CodeGen/SelectionDAG.h
M llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
M llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
M llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
M llvm/test/CodeGen/AMDGPU/fold-gep-offset.ll
M llvm/test/CodeGen/AMDGPU/loop-prefetch-data.ll
M llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
M llvm/test/Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll
Log Message:
-----------
[AMDGPU][SDAG] Only fold flat offsets if they are inbounds
For flat memory instructions where the address is supplied as a base address
register with an immediate offset, the memory aperture test ignores the
immediate offset. Currently, ISel does not respect that, which leads to
miscompilations where valid input programs crash when the address computation
relies on the immediate offset to get the base address in the proper memory
aperture. Global or scratch instructions are not affected.
This patch only selects flat instructions with immediate offsets from address
computations with the inbounds flag: If the address computation does not leave
the bounds of the allocated object, it cannot leave the bounds of the memory
aperture and is therefore safe to handle with an immediate offset.
It also adds the inbounds flag to DAG nodes resulting from transformations:
- Address computations resulting from getObjectPtrOffset. As far as I can tell,
this function is only used to compute addresses within accessed memory ranges,
e.g., for loads and stores that are split during legalization.
- Reassociated inbounds adds. If both involved operations are inbounds, then so
are operations after the transformation.
- Address computations in the SelectionDAG lowering of the memcpy/move/set
intrinsics. Base and result of the address arithmetic there are accessed, so
the operation must be inbounds.
It might make sense to separate these changes into their own PR, but I don't
see a way to test them without adding a use of the inbounds SDAG flag.
Affected tests:
- CodeGen/AMDGPU/fold-gep-offset.ll: Offsets are no longer wrongly folded,
added new positive tests where we still do fold them.
- Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll: Offset folding doesn't
seem integral to this test, so the test is not changed to make offset folding
still happen.
- CodeGen/AMDGPU/loop-prefetch-data.ll: loop-reduce prefers to base addresses
on the potentially OOB addresses used for prefetching for memory accesses,
that might be a separate issue to look into.
- Added memset tests to CodeGen/AMDGPU/memintrinsic-unroll.ll to make sure that
offsets in the memset DAG lowering are still folded properly.
A similar patch for GlobalISel will follow.
Fixes SWDEV-516125.
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list