[PATCH] Divergence analysis for GPU programs

Mon Mar 30 19:40:46 PDT 2015

================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:176
@@ +175,3 @@
+  }
+  if (Cond == nullptr)
+    return;
----------------
Can Cond ever be null here, given that we only get to this point if *TI has been marked as a potentially divergent terminator instruction?

================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:179
@@ +178,3 @@
+
+  // Since TI is divergent, Cond is also divergent. Per the definition of sync
+  // dependency, we mark all PHINodes in TI's immediate post dominator block as
----------------
(ignore - for some reason Phabricator won't let me delete this)

================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:182
@@ +181,3 @@
+  // divergent.
+  BasicBlock *IPostDom = PDT.getNode(TI->getParent())->getIDom()->getBlock();
+  if (IPostDom == nullptr)
----------------
More phi nodes than these might need to be marked as diverging if diverging warps can be recognized by the hardware to converge at a point prior to the immediate post-dominator based on the (dynamic) path taken by each diverging subset of threads.

================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:185
@@ +184,3 @@
+    return;
+  for (auto I = IPostDom->begin(); IPostDom->getFirstNonPHI() != I; ++I) {
+    if (Visited.insert(I).second)
----------------
It's better to only make one call to getFirstNonPHI(), since it runs in linear time, so this loop is otherwise quadratic in the number of phi nodes:

http://llvm.org/docs/doxygen/html/BasicBlock_8cpp_source.html#l00161

================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:211
@@ +210,3 @@
+      exploreSyncDependency(TI);
+    }
+    exploreDataDependency(V);
----------------
(ignore - for some reason Phabricator won't let me delete this)

================
Comment at: lib/Analysis/DivergenceAnalysis.cpp:212
@@ +211,3 @@
+    }
+    exploreDataDependency(V);
+  }
----------------
Does any terminator instruction have a value? If not, I think this could be in an else branch.

================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:55
@@ +54,3 @@
+    if (isa<LoadInst>(I))
+      return true;
+    // Atomic instructions may cause divergence. Atomic instructions are
----------------
If all the threads in a warp load the same address at the same time, I think that they should all get the same value. If that's right, then the analysis would remain conservative by letting loads of non-divergent pointers yield non-divergent values, regardless of aliasing.

================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:58
@@ +57,3 @@
+    // executed sequentially across all threads in a warp. Therefore, an earlier
+    // executed thread may see different memory inputs than an later executed
+    // thread. For example, suppose *a = 0 initially.
----------------
an -> a

================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:68
@@ +67,3 @@
+    if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
+      // Instructions that read threadIdx are abviously divergent.
+      if (readsThreadIndex(II))
----------------
abviously -> obviously

================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:71
@@ +70,3 @@
+        return true;
+      // Handle the NVPTX atomic instrinsics which cannot be represented as an
+      // atomic IR instruction.
----------------
which -> that

http://reviews.llvm.org/D8576

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/