[PATCH] D121157: [AMDGPU] always use underlying object in the pointsToConstantMemory

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 7 14:54:44 PST 2022


rampitec added a comment.

In D121157#3365626 <https://reviews.llvm.org/D121157#3365626>, @arsenm wrote:

> In D121157#3365618 <https://reviews.llvm.org/D121157#3365618>, @rampitec wrote:
>
>> In D121157#3365603 <https://reviews.llvm.org/D121157#3365603>, @arsenm wrote:
>>
>>> In D121157#3365577 <https://reviews.llvm.org/D121157#3365577>, @rampitec wrote:
>>>
>>>> I see 2 options to fix the bug:
>>>>
>>>> 1. Go with this patch potentially checking what getUnderlyingObject brought;
>>>> 2. Change AMDGPUPromoteKernelArguments to attach noclobber metadata instead of a cast. That promise will hold unlike an invariant.
>>>
>>> I think both are necessary, and checking for constant address space at all may be wrong
>>
>> Ugh. 1) does not really work because of the inttoptr our BE started to produce recently to get to the arguments.
>
> Where is inttoptr introduced? That should never happen

Agree. It is from the SILoadStoreOptimizer:

  *** IR Dump After GPU Load and Store Vectorizer (load-store-vectorizer) ***
  define amdgpu_kernel void @const_arg_does_not_alias_global(i32 addrspace(1)* %arg, i32 addrspace(4)* %arg.const) #0 {
  entry:
    %const_arg_does_not_alias_global.kernarg.segment = call nonnull align 16 dereferenceable(52) i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr()
    %arg.kernarg.offset = getelementptr inbounds i8, i8 addrspace(4)* %const_arg_does_not_alias_global.kernarg.segment, i64 36
    %arg.kernarg.offset.cast = bitcast i8 addrspace(4)* %arg.kernarg.offset to i32 addrspace(1)* addrspace(4)*
    %0 = bitcast i32 addrspace(1)* addrspace(4)* %arg.kernarg.offset.cast to <2 x i64> addrspace(4)*
    %1 = load <2 x i64>, <2 x i64> addrspace(4)* %0, align 4, !invariant.load !0
    %arg.load2 = extractelement <2 x i64> %1, i32 0
    %2 = inttoptr i64 %arg.load2 to i32 addrspace(1)*
    %arg.const.load3 = extractelement <2 x i64> %1, i32 1
    %3 = inttoptr i64 %arg.const.load3 to i32 addrspace(4)*
    %arg.const.kernarg.offset = getelementptr inbounds i8, i8 addrspace(4)* %const_arg_does_not_alias_global.kernarg.segment, i64 44
    %arg.const.kernarg.offset.cast = bitcast i8 addrspace(4)* %arg.const.kernarg.offset to i32 addrspace(4)* addrspace(4)*
    %id = tail call i32 @llvm.amdgcn.workitem.id.x(), !range !1
    %idxprom = sext i32 %id to i64
    %ptr = getelementptr inbounds i32, i32 addrspace(1)* %2, i64 %idxprom
    %ptr.const = getelementptr inbounds i32, i32 addrspace(4)* %3, i64 %idxprom
    store i32 42, i32 addrspace(1)* %ptr, align 4
    %v = load i32, i32 addrspace(4)* %ptr.const, align 4
    store i32 %v, i32* undef, align 4
    ret void
  }


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121157/new/

https://reviews.llvm.org/D121157



More information about the llvm-commits mailing list