[PATCH] D69826: [hip] Enable pointer argument lowering through coercing type.
Yaxun Liu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Nov 5 10:56:54 PST 2019
yaxunl added a comment.
In D69826#1734324 <https://reviews.llvm.org/D69826#1734324>, @hliao wrote:
> In D69826#1734296 <https://reviews.llvm.org/D69826#1734296>, @yaxunl wrote:
>
> > I am a little bit concerned that user may have such code:
> >
> > struct A { int *p; }
> > __global__ kernel(A a) {
> > int x;
> > a.p = &x;
> > f(a);
> > }
> >
> >
> > @arsenm what happens if a private pointer is mis-used as a global pointer?
> >
> > I am wondering if we should coerce byval struct kernel arg to global only if they are const, e.g.
> >
> > __global__ kernel(const A a);
> >
> >
> > I understand this may lose performance. Or should we introduce an option to let user disable coerce of non-const struct kernel arg to global.
>
>
> This should not be a concern. The coercing is only applied to the parameter itself. Within the function body, we still use the original `struct A`. The preparation in function prolog will copy that coerced argument into the original one (alloca-ed.) The modification of that parameter later will be applied to the original one due to the by-val nature.
>
> A modified version of your code is compiled into the following code at O0:
>
> define protected amdgpu_kernel void @_Z3foo1A(%struct.A.coerce %a.coerce) #0 {
> entry:
> %a = alloca %struct.A, align 8, addrspace(5)
> %a1 = addrspacecast %struct.A addrspace(5)* %a to %struct.A*
> %x = alloca i32, align 4, addrspace(5)
> %x.ascast = addrspacecast i32 addrspace(5)* %x to i32*
> %agg.tmp = alloca %struct.A, align 8, addrspace(5)
> %agg.tmp.ascast = addrspacecast %struct.A addrspace(5)* %agg.tmp to %struct.A*
> %0 = bitcast %struct.A* %a1 to %struct.A.coerce*
> %1 = getelementptr inbounds %struct.A.coerce, %struct.A.coerce* %0, i32 0, i32 0
> %2 = extractvalue %struct.A.coerce %a.coerce, 0
> store i32 addrspace(1)* %2, i32 addrspace(1)** %1, align 8
> %3 = getelementptr inbounds %struct.A.coerce, %struct.A.coerce* %0, i32 0, i32 1
> %4 = extractvalue %struct.A.coerce %a.coerce, 1
> store i32 addrspace(1)* %4, i32 addrspace(1)** %3, align 8
> %p = getelementptr inbounds %struct.A, %struct.A* %a1, i32 0, i32 0
> store i32* %x.ascast, i32** %p, align 8
> %5 = bitcast %struct.A* %agg.tmp.ascast to i8*
> %6 = bitcast %struct.A* %a1 to i8*
> call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %5, i8* align 8 %6, i64 16, i1 false)
> %7 = getelementptr inbounds %struct.A, %struct.A* %agg.tmp.ascast, i32 0, i32 0
> %8 = load i32*, i32** %7, align 8
> %9 = getelementptr inbounds %struct.A, %struct.A* %agg.tmp.ascast, i32 0, i32 1
> %10 = load i32*, i32** %9, align 8
> call void @_Z1f1A(i32* %8, i32* %10) #3
> ret void
> }
>
>
> The modification of parameter `a` is applied the alloca-ed one.
OK. Thanks for clarification.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D69826/new/
https://reviews.llvm.org/D69826
More information about the cfe-commits
mailing list