[PATCH] D69826: [hip] Enable pointer argument lowering through coercing type.
Michael Liao via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Nov 5 10:38:20 PST 2019
hliao added a comment.
In D69826#1734296 <https://reviews.llvm.org/D69826#1734296>, @yaxunl wrote:
> I am a little bit concerned that user may have such code:
>
> struct A { int *p; }
> __global__ kernel(A a) {
> int x;
> a.p = &x;
> f(a);
> }
>
>
> @arsenm what happens if a private pointer is mis-used as a global pointer?
>
> I am wondering if we should coerce byval struct kernel arg to global only if they are const, e.g.
>
> __global__ kernel(const A a);
>
>
> I understand this may lose performance. Or should we introduce an option to let user disable coerce of non-const struct kernel arg to global.
This should not be a concern. The coercing is only applied to the parameter itself. Within the function body, we still use the original `struct A`. The preparation in function prolog will copy that coerced argument into the original one (alloca-ed.) The modification of that parameter later will be applied to the original one due to the by-val nature.
A modified version of your code is compiled into the following code at O0:
define protected amdgpu_kernel void @_Z3foo1A(%struct.A.coerce %a.coerce) #0 {
entry:
%a = alloca %struct.A, align 8, addrspace(5)
%a1 = addrspacecast %struct.A addrspace(5)* %a to %struct.A*
%x = alloca i32, align 4, addrspace(5)
%x.ascast = addrspacecast i32 addrspace(5)* %x to i32*
%agg.tmp = alloca %struct.A, align 8, addrspace(5)
%agg.tmp.ascast = addrspacecast %struct.A addrspace(5)* %agg.tmp to %struct.A*
%0 = bitcast %struct.A* %a1 to %struct.A.coerce*
%1 = getelementptr inbounds %struct.A.coerce, %struct.A.coerce* %0, i32 0, i32 0
%2 = extractvalue %struct.A.coerce %a.coerce, 0
store i32 addrspace(1)* %2, i32 addrspace(1)** %1, align 8
%3 = getelementptr inbounds %struct.A.coerce, %struct.A.coerce* %0, i32 0, i32 1
%4 = extractvalue %struct.A.coerce %a.coerce, 1
store i32 addrspace(1)* %4, i32 addrspace(1)** %3, align 8
%p = getelementptr inbounds %struct.A, %struct.A* %a1, i32 0, i32 0
store i32* %x.ascast, i32** %p, align 8
%5 = bitcast %struct.A* %agg.tmp.ascast to i8*
%6 = bitcast %struct.A* %a1 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %5, i8* align 8 %6, i64 16, i1 false)
%7 = getelementptr inbounds %struct.A, %struct.A* %agg.tmp.ascast, i32 0, i32 0
%8 = load i32*, i32** %7, align 8
%9 = getelementptr inbounds %struct.A, %struct.A* %agg.tmp.ascast, i32 0, i32 1
%10 = load i32*, i32** %9, align 8
call void @_Z1f1A(i32* %8, i32* %10) #3
ret void
}
The modification of parameter `a` is applied the alloca-ed one.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D69826/new/
https://reviews.llvm.org/D69826
More information about the cfe-commits
mailing list