[PATCH] D50991: [AMDGPU] Consider loads from flat addrspace to be potentially divergent
Scott Linder via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 20 12:54:35 PDT 2018
scott.linder created this revision.
scott.linder added a reviewer: arsenm.
Herald added subscribers: llvm-commits, t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl.
Loads through flat addrspace to non-uniform values were being marked uniform. In general we can't assume flat loads are uniform, and cases where we can prove they are should be handled through infer-address-spaces.
Repository:
rL LLVM
https://reviews.llvm.org/D50991
Files:
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
test/CodeGen/AMDGPU/divergent-flat.ll
Index: test/CodeGen/AMDGPU/divergent-flat.ll
===================================================================
--- /dev/null
+++ test/CodeGen/AMDGPU/divergent-flat.ll
@@ -0,0 +1,30 @@
+; RUN: llc -march=amdgcn -mcpu=gfx900 -o - %s | FileCheck %s --check-prefix=ASM
+; RUN: llc -march=amdgcn -mcpu=gfx900 -stop-after=structurizecfg -o - %s | FileCheck %s --check-prefix=STRUCTURIZECFG
+
+; Test that we do not consider loads from flat addrspace to be uniform.
+
+define amdgpu_kernel void @spam(float* %a) #0 {
+ %priv = alloca i32, align 4, addrspace(5)
+ %flat = addrspacecast i32 addrspace(5)* %priv to i32*
+ %idx = call i32 @llvm.amdgcn.workitem.id.x()
+
+ store i32 %idx, i32* %flat, align 4
+ %b = load i32, i32* %flat, align 4
+
+ %cmp = icmp slt i32 %b, 1
+; ASM: s_mov_b64 exec, s[{{[0-9]+}}:{{[0-9]+}}]
+; ASM-NOT: s_cbranch_vccnz
+; STRUCTURIZECFG-NOT: structurizecfg.uniform
+ br i1 %cmp, label %body, label %end
+
+body:
+ store float 1.000000e+00, float* %a, align 4
+ br label %end
+
+end:
+ ret void
+}
+
+declare i32 @llvm.amdgcn.workitem.id.x()
+
+attributes #0 = { noinline optnone }
Index: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
===================================================================
--- lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -545,14 +545,15 @@
if (const Argument *A = dyn_cast<Argument>(V))
return !isArgPassedInSGPR(A);
- // Loads from the private address space are divergent, because threads
- // can execute the load instruction with the same inputs and get different
- // results.
+ // Loads from the private and flat address spaces are divergent, because
+ // threads can execute the load instruction with the same inputs and get
+ // different results.
//
// All other loads are not divergent, because if threads issue loads with the
// same arguments, they will always get the same result.
if (const LoadInst *Load = dyn_cast<LoadInst>(V))
- return Load->getPointerAddressSpace() == ST->getAMDGPUAS().PRIVATE_ADDRESS;
+ return Load->getPointerAddressSpace() == ST->getAMDGPUAS().PRIVATE_ADDRESS
+ || Load->getPointerAddressSpace() == ST->getAMDGPUAS().FLAT_ADDRESS;
// Atomics are divergent because they are executed sequentially: when an
// atomic operation refers to the same address in each thread, then each
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D50991.161543.patch
Type: text/x-patch
Size: 2386 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180820/be160bcc/attachment.bin>
More information about the llvm-commits
mailing list