[PATCH] Divergence analysis for GPU programs
Jingyue Wu
jingyue at google.com
Thu Apr 2 22:48:49 PDT 2015
================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:176
@@ +175,3 @@
+ }
+ if (Cond == nullptr)
+ return;
----------------
bjarke.roune wrote:
> jingyue wrote:
> > bjarke.roune wrote:
> > > Can Cond ever be null here, given that we only get to this point if *TI has been marked as a potentially divergent terminator instruction?
> > Since we run reachability on the data flow graph, the Worklist may contain TerminatorInsts (e.g., InvokeInst) that belong to none of the above three types.
> Would a check for >= 2 successors do the trick? If not, is it necessary to pull out the conditions, or would it suffice to have a boolean HasCond? I wonder if BI->getCondition(), IBI->getAdress() and SI->getCondition() can ever be null.
InvokeInst also has two successors. BI->getCondition(), IBI->getAddress() and SI->getCondition() are all non-null.
================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:185
@@ +184,3 @@
+ return;
+ for (auto I = IPostDom->begin(); IPostDom->getFirstNonPHI() != I; ++I) {
+ if (Visited.insert(I).second)
----------------
jingyue wrote:
> bjarke.roune wrote:
> > It's better to only make one call to getFirstNonPHI(), since it runs in linear time, so this loop is otherwise quadratic in the number of phi nodes:
> >
> > http://llvm.org/docs/doxygen/html/BasicBlock_8cpp_source.html#l00161
> Nice catch! Will fix.
Done.
================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:51
@@ +50,3 @@
+
+ if (const Instruction *I = dyn_cast<Instruction>(V)) {
+ // Without pointer analysis, we conservatively assume values loaded are
----------------
bjarke.roune wrote:
> The return value of function calls can also be divergent. I think this function doesn't account for that?
You're right. Done.
================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:55
@@ +54,3 @@
+ if (isa<LoadInst>(I))
+ return true;
+ // Atomic instructions may cause divergence. Atomic instructions are
----------------
bjarke.roune wrote:
> jingyue wrote:
> > bjarke.roune wrote:
> > > If all the threads in a warp load the same address at the same time, I think that they should all get the same value. If that's right, then the analysis would remain conservative by letting loads of non-divergent pointers yield non-divergent values, regardless of aliasing.
> > That's a very good point! We can be optimistic for at least shared and const loads.
> >
> > The local address space is private to each thread. Consequently, generic loads need to be treated conservatively, because their addresses may be converted from local addresses.
> I hadn't considered local memory. If we marked local addresses as sources of divergence, the only way I can think of where that might go wrong right now is if a local pointer value is stored into global memory and then read back again, then the thus laundered local pointer is used to store a divergent value and then that divergent value is read back again without being marked as divergent. I wonder if that's a common use case.
Bjarke, I am not sure I understand what you mean by "might go wrong".
Marking values loaded from local address space as divergent is sound, i.e., if our analysis says a branch is non-divergent, it is indeed non-divergent.
================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:58
@@ +57,3 @@
+ // executed sequentially across all threads in a warp. Therefore, an earlier
+ // executed thread may see different memory inputs than an later executed
+ // thread. For example, suppose *a = 0 initially.
----------------
bjarke.roune wrote:
> an -> a
Done. Thanks!
================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:68
@@ +67,3 @@
+ if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
+ // Instructions that read threadIdx are abviously divergent.
+ if (readsThreadIndex(II))
----------------
bjarke.roune wrote:
> abviously -> obviously
Done.
================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:71
@@ +70,3 @@
+ return true;
+ // Handle the NVPTX atomic instrinsics which cannot be represented as an
+ // atomic IR instruction.
----------------
bjarke.roune wrote:
> which -> that
Done.
http://reviews.llvm.org/D8576
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list