[llvm-dev] [AMDGPU] Strange results with different address spaces

Matt Arsenault via llvm-dev llvm-dev at lists.llvm.org
Wed Dec 6 10:45:15 PST 2017



> On Dec 6, 2017, at 02:28, Haidl, Michael <michael.haidl at uni-muenster.de> wrote:
> 
>  The IR goes through a backend agnostic preparation phase that brings it into SSA from and changes the AS from 0 to 1.

This sounds possibly problematic to me. The IR should be created with the correct address space to begin with. Changing this in the middle sounds suspect.

> After this phase the IR goes through another pass manager that performs O3 passes and the AMDGPU target passes for object file generation. I looked into the AMDGPU backend and the only place where this metadata is added is in AMDGPUAnnotateUniformValues.cpp. The pass queries dependency analysis for the load and checks if it is reported as uniform. Afterwards the metadata is added to the GEP. 
>  
> Removing the O3 passes before code generation solves the problem so does separating the O3 passes and the backend passes into separate pass managers. I assume dependency analysis does not run in the second pass manager because no metadata is generated at all.
>  
> Could this be a bug in DA reporting the load falsely as uniform by not taking the intrinsics into account?
>  
> Cheers,
> Michael
>  

The intrinsics certainly are correctly treated as divergent. Nothing would work otherwise. If I run the annotate pass or analysis on the examples it does the right thing and sees the load as divergent.

$ opt -S -analyze -divergence -o - as1.ll
Printing analysis 'Divergence Analysis' for function '_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT_':
DIVERGENT:  %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11
DIVERGENT:  %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6
DIVERGENT:  %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64
DIVERGENT:  %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce0, i64 %idxprom.i.i.i
DIVERGENT:  %9 = load i32, i32 addrspace(1)* %8, align 4
DIVERGENT:  %10 = getelementptr [16 x i32], [16 x i32] addrspace(3)* @"_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT__sm0", i32 0, i32 %6
DIVERGENT:  store i32 %9, i32 addrspace(3)* %10, align 4
DIVERGENT:  %11 = load i32, i32 addrspace(3)* %10, align 4
DIVERGENT:  %12 = getelementptr i32, i32 addrspace(1)* %callable.coerce1, i64 %idxprom.i.i.i
DIVERGENT:  store i32 %11, i32 addrspace(1)* %12, align 4

I’m also questioning how/where you obtained this dump. You have the declarations for the control flow intrinsics in there, which should only ever appear when the backend inserts them as part of codegen. There’s something suspicious about your pass setup. What does the IR look like immediately before AMDGPUAnnotateUniformValues, and immediately out of the frontend?

-Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171206/513a32de/attachment-0001.html>


More information about the llvm-dev mailing list