[llvm-bugs] [Bug 51445] New: SCCP folds addrspace casts
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Aug 11 16:12:10 PDT 2021
https://bugs.llvm.org/show_bug.cgi?id=51445
Bug ID: 51445
Summary: SCCP folds addrspace casts
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Scalar Optimizations
Assignee: unassignedbugs at nondot.org
Reporter: jonathanchesterfield at gmail.com
CC: jdoerfert at anl.gov, llvm-bugs at lists.llvm.org,
Matthew.Arsenault at amd.com
Example from amdgpu where a math header is being miscompiled for amdgpu.
Tagging as SCCP, though it is possible SCCP is doing the right thing and the
bug is in the library code it is transforming.
```
target datalayout =
"e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7"
target triple = "amdgcn-amd-amdhsa"
@__tmp.i = internal addrspace(3) global [8 x i8] undef, align 32
declare double @__ocml_sincos_f64(double noundef, double addrspace(5)* noundef
writeonly align 8)
define double @func() {
%__tmp_on_stack.i = bitcast i8* addrspacecast (i8 addrspace(3)*
getelementptr inbounds ([8 x i8], [8 x i8] addrspace(3)* @__tmp.i, i32 0, i32
0) to i8*) to double*
%__tmp_on_stack.ascast.i = addrspacecast double* %__tmp_on_stack.i to double
addrspace(5)*
%call.i = call double @__ocml_sincos_f64(double noundef 0.000000e+00, double
addrspace(5)* noundef writeonly align 8 %__tmp_on_stack.ascast.i) #24
ret double %call.i
}
```
Passed through opt -sccp simplifies the function to
```
define double @func() {
%call.i = call double @__ocml_sincos_f64(double noundef 0.000000e+00, double
addrspace(5)* noundef writeonly align 8 addrspacecast (double addrspace(3)*
bitcast ([8 x i8] addrspace(3)* @__tmp.i to double addrspace(3)*) to double
addrspace(5)*))
ret double %call.i
}
```
The addrspace cast from 3 to 5 reaches the backend where it is 'lowered' by
warning and returning undef.
SCCP thinks this is a constant because it is a global. Therefore it can be
folded. Folding addrspace casts together probably requires a target legality
test which doesn't seem to happen in the SCCP lattice operations.
If SCCP is right to fold the two addrspacecasts together then the call site in
__clang_hip_math.h probably needs to change to explicitly put the temporary
value in addrspace(5) instead of casting the address. That is fixable for
openmp but possibly not for hip. Function in question is:
```
__DEVICE__
void sincos(double __x, double *__sinptr, double *__cosptr) {
double __tmp;
*__sinptr = __ocml_sincos_f64(
__x, (__attribute__((address_space(5))) double *)&__tmp);
*__cosptr = __tmp;
}
```
Equally possible is that address_space(5) on a function argument is ill formed
for amdgpu and this is a backwards-incompatible fix needed to the rocm device
library.
(I haven't been able to work out exactly where openmp codegen decides to
allocate __tmp using __kmpc_alloc_shared instead of as an alloca yet, but this
can be hit without openmp if the temporary variable passed is marked shared)
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210811/da5f643a/attachment.html>
More information about the llvm-bugs
mailing list