[llvm-bugs] [Bug 51445] New: SCCP folds addrspace casts

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Aug 11 16:12:10 PDT 2021


https://bugs.llvm.org/show_bug.cgi?id=51445

            Bug ID: 51445
           Summary: SCCP folds addrspace casts
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Scalar Optimizations
          Assignee: unassignedbugs at nondot.org
          Reporter: jonathanchesterfield at gmail.com
                CC: jdoerfert at anl.gov, llvm-bugs at lists.llvm.org,
                    Matthew.Arsenault at amd.com

Example from amdgpu where a math header is being miscompiled for amdgpu.
Tagging as SCCP, though it is possible SCCP is doing the right thing and the
bug is in the library code it is transforming.

```
target datalayout =
"e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7"
target triple = "amdgcn-amd-amdhsa"

@__tmp.i = internal addrspace(3) global [8 x i8] undef, align 32

declare double @__ocml_sincos_f64(double noundef, double addrspace(5)* noundef
writeonly align 8)

define double @func() {
   %__tmp_on_stack.i = bitcast i8* addrspacecast (i8 addrspace(3)*
getelementptr inbounds ([8 x i8], [8 x i8] addrspace(3)* @__tmp.i, i32 0, i32
0) to i8*) to double*   
  %__tmp_on_stack.ascast.i = addrspacecast double* %__tmp_on_stack.i to double
addrspace(5)*
  %call.i = call double @__ocml_sincos_f64(double noundef 0.000000e+00, double
addrspace(5)* noundef writeonly align 8 %__tmp_on_stack.ascast.i) #24
   ret double %call.i
}
```

Passed through opt -sccp simplifies the function to
```
define double @func() {
  %call.i = call double @__ocml_sincos_f64(double noundef 0.000000e+00, double
addrspace(5)* noundef writeonly align 8 addrspacecast (double addrspace(3)*
bitcast ([8 x i8] addrspace(3)* @__tmp.i to double addrspace(3)*) to double
addrspace(5)*))
  ret double %call.i
}
```

The addrspace cast from 3 to 5 reaches the backend where it is 'lowered' by
warning and returning undef.

SCCP thinks this is a constant because it is a global. Therefore it can be
folded. Folding addrspace casts together probably requires a target legality
test which doesn't seem to happen in the SCCP lattice operations.

If SCCP is right to fold the two addrspacecasts together then the call site in
__clang_hip_math.h probably needs to change to explicitly put the temporary
value in addrspace(5) instead of casting the address. That is fixable for
openmp but possibly not for hip. Function in question is:

```
__DEVICE__
void sincos(double __x, double *__sinptr, double *__cosptr) {
  double __tmp;
  *__sinptr = __ocml_sincos_f64(
      __x, (__attribute__((address_space(5))) double *)&__tmp);
  *__cosptr = __tmp;
}
```

Equally possible is that address_space(5) on a function argument is ill formed
for amdgpu and this is a backwards-incompatible fix needed to the rocm device
library.

(I haven't been able to work out exactly where openmp codegen decides to
allocate __tmp using __kmpc_alloc_shared instead of as an alloca yet, but this
can be hit without openmp if the temporary variable passed is marked shared)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210811/da5f643a/attachment.html>


More information about the llvm-bugs mailing list