[PATCH] Divergence analysis for GPU programs

Mon Mar 30 22:43:21 PDT 2015

All other comments are ack'ed and will be addressed.

================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:176
@@ +175,3 @@
+  }
+  if (Cond == nullptr)
+    return;
----------------
bjarke.roune wrote:
> Can Cond ever be null here, given that we only get to this point if *TI has been marked as a potentially divergent terminator instruction?
Since we run reachability on the data flow graph, the Worklist may contain TerminatorInsts (e.g., InvokeInst) that belong to none of the above three types. 

================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:182
@@ +181,3 @@
+  // divergent.
+  BasicBlock *IPostDom = PDT.getNode(TI->getParent())->getIDom()->getBlock();
+  if (IPostDom == nullptr)
----------------
bjarke.roune wrote:
> More phi nodes than these might need to be marked as diverging if diverging warps can be recognized by the hardware to converge at a point prior to the immediate post-dominator based on the (dynamic) path taken by each diverging subset of threads.
This is related to the bug I noticed in dealing with values defined in loops. I need to think more about it. 

================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:185
@@ +184,3 @@
+    return;
+  for (auto I = IPostDom->begin(); IPostDom->getFirstNonPHI() != I; ++I) {
+    if (Visited.insert(I).second)
----------------
bjarke.roune wrote:
> It's better to only make one call to getFirstNonPHI(), since it runs in linear time, so this loop is otherwise quadratic in the number of phi nodes:
> 
> http://llvm.org/docs/doxygen/html/BasicBlock_8cpp_source.html#l00161
Nice catch! Will fix. 

================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:212
@@ +211,3 @@
+    }
+    exploreDataDependency(V);
+  }
----------------
bjarke.roune wrote:
> Does any terminator instruction have a value? If not, I think this could be in an else branch.
InvokeInst. Even if not, I'd still prefer doing sync and data dependency in separate stages, just to avoid some mental effort reasoning about "else" :)

================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:55
@@ +54,3 @@
+    if (isa<LoadInst>(I))
+      return true;
+    // Atomic instructions may cause divergence. Atomic instructions are
----------------
bjarke.roune wrote:
> If all the threads in a warp load the same address at the same time, I think that they should all get the same value. If that's right, then the analysis would remain conservative by letting loads of non-divergent pointers yield non-divergent values, regardless of aliasing.
That's a very good point! We can be optimistic for at least shared and const loads. 

The local address space is private to each thread. Consequently, generic loads need to be treated conservatively, because their addresses may be converted from local addresses.

http://reviews.llvm.org/D8576

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/