[PATCH] D69826: [hip] Enable pointer argument lowering through coercing type.

Yaxun Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Nov 5 10:56:54 PST 2019


yaxunl added a comment.

In D69826#1734324 <https://reviews.llvm.org/D69826#1734324>, @hliao wrote:

> In D69826#1734296 <https://reviews.llvm.org/D69826#1734296>, @yaxunl wrote:
>
> > I am a little bit concerned that user may have such code:
> >
> >   struct A { int *p; }
> >   __global__ kernel(A a) {
> >     int x;
> >     a.p = &x;
> >     f(a);
> >   }
> >
> >
> > @arsenm what happens if a private pointer is mis-used as a global pointer?
> >
> > I am wondering if we should coerce byval struct kernel arg to global only if they are const, e.g.
> >
> >   __global__ kernel(const A a);
> >
> >
> > I understand this may lose performance. Or should we introduce an option to let user disable coerce of non-const struct kernel arg to global.
>
>
> This should not be a concern. The coercing is only applied to the parameter itself. Within the function body, we still use the original `struct A`. The preparation in function prolog will copy that coerced argument into the original one (alloca-ed.) The modification of that parameter later will be applied to the original one due to the by-val nature.
>
> A modified version of your code is compiled into the following code at O0:
>
>   define protected amdgpu_kernel void @_Z3foo1A(%struct.A.coerce %a.coerce) #0 {
>   entry:
>     %a = alloca %struct.A, align 8, addrspace(5)
>     %a1 = addrspacecast %struct.A addrspace(5)* %a to %struct.A*
>     %x = alloca i32, align 4, addrspace(5)
>     %x.ascast = addrspacecast i32 addrspace(5)* %x to i32*
>     %agg.tmp = alloca %struct.A, align 8, addrspace(5)
>     %agg.tmp.ascast = addrspacecast %struct.A addrspace(5)* %agg.tmp to %struct.A*
>     %0 = bitcast %struct.A* %a1 to %struct.A.coerce*
>     %1 = getelementptr inbounds %struct.A.coerce, %struct.A.coerce* %0, i32 0, i32 0
>     %2 = extractvalue %struct.A.coerce %a.coerce, 0
>     store i32 addrspace(1)* %2, i32 addrspace(1)** %1, align 8
>     %3 = getelementptr inbounds %struct.A.coerce, %struct.A.coerce* %0, i32 0, i32 1
>     %4 = extractvalue %struct.A.coerce %a.coerce, 1
>     store i32 addrspace(1)* %4, i32 addrspace(1)** %3, align 8
>     %p = getelementptr inbounds %struct.A, %struct.A* %a1, i32 0, i32 0
>     store i32* %x.ascast, i32** %p, align 8
>     %5 = bitcast %struct.A* %agg.tmp.ascast to i8*
>     %6 = bitcast %struct.A* %a1 to i8*
>     call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %5, i8* align 8 %6, i64 16, i1 false)
>     %7 = getelementptr inbounds %struct.A, %struct.A* %agg.tmp.ascast, i32 0, i32 0
>     %8 = load i32*, i32** %7, align 8
>     %9 = getelementptr inbounds %struct.A, %struct.A* %agg.tmp.ascast, i32 0, i32 1
>     %10 = load i32*, i32** %9, align 8
>     call void @_Z1f1A(i32* %8, i32* %10) #3
>     ret void
>   }
>
>
> The modification of parameter `a` is applied the alloca-ed one.


OK. Thanks for clarification.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69826/new/

https://reviews.llvm.org/D69826





More information about the cfe-commits mailing list