[PATCH] D121157: [AMDGPU] always use underlying object in the pointsToConstantMemory
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 7 14:54:44 PST 2022
rampitec added a comment.
In D121157#3365626 <https://reviews.llvm.org/D121157#3365626>, @arsenm wrote:
> In D121157#3365618 <https://reviews.llvm.org/D121157#3365618>, @rampitec wrote:
>
>> In D121157#3365603 <https://reviews.llvm.org/D121157#3365603>, @arsenm wrote:
>>
>>> In D121157#3365577 <https://reviews.llvm.org/D121157#3365577>, @rampitec wrote:
>>>
>>>> I see 2 options to fix the bug:
>>>>
>>>> 1. Go with this patch potentially checking what getUnderlyingObject brought;
>>>> 2. Change AMDGPUPromoteKernelArguments to attach noclobber metadata instead of a cast. That promise will hold unlike an invariant.
>>>
>>> I think both are necessary, and checking for constant address space at all may be wrong
>>
>> Ugh. 1) does not really work because of the inttoptr our BE started to produce recently to get to the arguments.
>
> Where is inttoptr introduced? That should never happen
Agree. It is from the SILoadStoreOptimizer:
*** IR Dump After GPU Load and Store Vectorizer (load-store-vectorizer) ***
define amdgpu_kernel void @const_arg_does_not_alias_global(i32 addrspace(1)* %arg, i32 addrspace(4)* %arg.const) #0 {
entry:
%const_arg_does_not_alias_global.kernarg.segment = call nonnull align 16 dereferenceable(52) i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr()
%arg.kernarg.offset = getelementptr inbounds i8, i8 addrspace(4)* %const_arg_does_not_alias_global.kernarg.segment, i64 36
%arg.kernarg.offset.cast = bitcast i8 addrspace(4)* %arg.kernarg.offset to i32 addrspace(1)* addrspace(4)*
%0 = bitcast i32 addrspace(1)* addrspace(4)* %arg.kernarg.offset.cast to <2 x i64> addrspace(4)*
%1 = load <2 x i64>, <2 x i64> addrspace(4)* %0, align 4, !invariant.load !0
%arg.load2 = extractelement <2 x i64> %1, i32 0
%2 = inttoptr i64 %arg.load2 to i32 addrspace(1)*
%arg.const.load3 = extractelement <2 x i64> %1, i32 1
%3 = inttoptr i64 %arg.const.load3 to i32 addrspace(4)*
%arg.const.kernarg.offset = getelementptr inbounds i8, i8 addrspace(4)* %const_arg_does_not_alias_global.kernarg.segment, i64 44
%arg.const.kernarg.offset.cast = bitcast i8 addrspace(4)* %arg.const.kernarg.offset to i32 addrspace(4)* addrspace(4)*
%id = tail call i32 @llvm.amdgcn.workitem.id.x(), !range !1
%idxprom = sext i32 %id to i64
%ptr = getelementptr inbounds i32, i32 addrspace(1)* %2, i64 %idxprom
%ptr.const = getelementptr inbounds i32, i32 addrspace(4)* %3, i64 %idxprom
store i32 42, i32 addrspace(1)* %ptr, align 4
%v = load i32, i32 addrspace(4)* %ptr.const, align 4
store i32 %v, i32* undef, align 4
ret void
}
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D121157/new/
https://reviews.llvm.org/D121157
More information about the llvm-commits
mailing list