[PATCH] Add a speculative execution pass

Tue May 5 22:13:49 PDT 2015

================
Comment at: include/llvm/Transforms/Scalar.h:426
@@ -425,1 +425,3 @@
 //
+// SpeculativeExecution - Speculatively execute instructions.
+//
----------------
hfinkel wrote:
> This pass does not speculatively execute anything. Please reword.
I'm very happy to correct any mistaken terminology. I'm unclear on whether you object to "speculation" or "execution" or both and what you would prefer instead? I note that SimplifyCFG has a function SpeculativelyExecuteBB that also moves instructions out of branches and I did see some sources with a similar terminology, e.g.

"Software Speculative Execution [...] 
When the compiler generates speculative code, it moves instructions in front of a branch that previously had protected them from causing exceptions."
http://techpubs.sgi.com/library/dynaweb_docs/0650/SGI_Developer/books/OrOn2_PfTune/sgi_html/ch05.html

================
Comment at: lib/Transforms/IPO/PassManagerBuilder.cpp:236
@@ +235,3 @@
+  if (RunSpeculation)
+    MPM.add(createSpeculativeExecutionPass());
+
----------------
hfinkel wrote:
> This is pretty early in the pipeline, why? Do you want LICM to run first?
I did start with a late placement, right before the nvptx backend runs. I also tried right before LICM and a bit after LICM and other places. The current placement is what worked best out of those things that I tried.

My intuition of why this early placement works well is that we speculate, optimize and then soon after those instructions that were speculated but did not enable further optimization get put back by InstCombine.

================
Comment at: lib/Transforms/Scalar/SpeculativeExecution.cpp:60
@@ +59,3 @@
+
+static cl::opt<unsigned> SpecExecMaxNotHoisted(
+    "spec-exec-max-not-hoisted", cl::init(5), cl::Hidden,
----------------
jingyue wrote:
> jingyue wrote:
> > Comment on why this threshold is necessary. My understanding is the NVIDIA driver only speculates a limited number of instructions. If too many instructions are left behind, the conditional basic block won't be executed in the SASS level. 
> I meant "won't be speculatively executed". 
Done. The base reason for this limit is that speculating just a few instructions from a larger block tends not to be profitable. That could in part be due to an interaction with the NVIDIA driver, even though this pass is run quite early in the compiler, putting it later makes our benchmarks worse, and speculated instructions that are not further optimized tend to be sunk back by InstCombine to where they were before speculation. In the best case that I've seen for this pass, speculation allowed further follow-on optimizations, collapsing a lot of control flow and bit-fiddling instructions into fewer instructions; that sort of thing is less likely to happen when speculating just a few instructions from a larger block, so that's another reason for the limit. The comment I added fits both reasons.

================
Comment at: lib/Transforms/Scalar/SpeculativeExecution.cpp:64
@@ +63,3 @@
+             "the number of instructions that cannot be speculatively executed "
+             "do not exceed this limit."));
+
----------------
meheff wrote:
> The double negative at the end of this description confused me as written.  Maybe:  "Speculative execution is not be applied to basic blocks where the number of ... exceeds this limit."
> 
> Could change the other option description to match.
Done.

================
Comment at: lib/Transforms/Scalar/SpeculativeExecution.cpp:69
@@ +68,3 @@
+  static char ID;
+  SpeculativeExecution(): FunctionPass(ID) {}
+
----------------
jingyue wrote:
> Add a constructor that takes `spec-exec-max-speculation-cost` and `spec-exec-max-not-hoisted` as parameters. This allows embedded uses of LLVM (e.g., users can programmatically create these passes with their desired thresholds). 
This doesn't seem to be common for the other passes that I looked at in Scalar/ that have flags and it seems like it would be hard to maintain when options are added or removed, so I'd prefer to postpone adding that until someone uses it. I could add it now if you'll be using it right away?

================
Comment at: lib/Transforms/Scalar/SpeculativeExecution.cpp:113
@@ +112,3 @@
+  if (Succ1.getSinglePredecessor() != nullptr &&
+      Succ1.getTerminator()->getNumSuccessors() == 1 &&
+      Succ1.getTerminator()->getSuccessor(0) == &Succ0) {
----------------
jingyue wrote:
> Can we make a new interface `getSingleSuccessor()`? 
Nice idea. Done.

================
Comment at: lib/Transforms/Scalar/SpeculativeExecution.cpp:141
@@ +140,3 @@
+  // benchmarking. There is likely room for improvement.
+  switch (Operator::getOpcode(I)) {
+    case Instruction::GetElementPtr:
----------------
jingyue wrote:
> Should we consider `ConstantExpr` free? 
Do you mean should we return 0 if all operands to an instruction are ConstantExpr? I'd think most such cases would be folded already, though maybe not for GEP, since I believe that it's required for changing types, so that does seem reasonable to catch.

================
Comment at: lib/Transforms/Scalar/SpeculativeExecution.cpp:146
@@ +145,3 @@
+        return 1;
+      return 2; // Changed by Jingyue.
+
----------------
meheff wrote:
> Stale comment?
Done.

================
Comment at: lib/Transforms/Scalar/SpeculativeExecution.cpp:147
@@ +146,3 @@
+      return 2; // Changed by Jingyue.
+
+    case Instruction::Add:
----------------
hfinkel wrote:
> And should this depend on the number of non-constant indices?
I'm especially keen to speculate GEPs, as that was part of the original motivation for the pass, but you're right that this might not be good. I'll try having the cost be the number of non-constant indices + 1 and get back to you.

================
Comment at: lib/Transforms/Scalar/SpeculativeExecution.cpp:159
@@ +158,3 @@
+    case Instruction::ZExt:
+    case Instruction::SExt:
+      return 1; // These are all cheap.
----------------
meheff wrote:
> Instruction::And?  Can you use TTI instruction costs here?
Since the speculated instructions can be sunk back or optimized later, the optimal score isn't necessarily the time it would take to execute the instructions, it's more propensity to cause good things and not cause bad things when speculated. Though TTI could still be a better approximation to that than what I've got here, so I'll try it both ways and get back to you.

================
Comment at: lib/Transforms/Scalar/SpeculativeExecution.cpp:180
@@ +179,3 @@
+
+  unsigned TotalCost = 0;
+  for (auto I = FromBlock.begin(), End = FromBlock.end(); I != End; ++I) {
----------------
jingyue wrote:
> TotalSpeculationCost
Done.

================
Comment at: lib/Transforms/Scalar/SpeculativeExecution.cpp:181
@@ +180,3 @@
+  unsigned TotalCost = 0;
+  for (auto I = FromBlock.begin(), End = FromBlock.end(); I != End; ++I) {
+    const unsigned Cost = ComputeSpeculationCost(I);
----------------
jingyue wrote:
> ```
> for (auto &I : FromBlock)
> ```
Done.

http://reviews.llvm.org/D9360

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/