[PATCH] D37467: Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable.

Thu Oct 5 18:08:33 PDT 2017

chandlerc added a comment.

(quick responses, still working on code updates from the review)

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:67
+    // instructions and allow handling stores in easily proven safe cases.
+    if (mayBeMemoryDependent(*UI)) {
+      DEBUG(dbgs() << "  Unsafe: can't speculate use: " << *UI << "\n");
----------------
davidxl wrote:
> This is too limiting.  The scenario is commonly created by the store sinking optimization -- which involves memory operation.
There is a FIXME discussing this already?

However, this pass is already able to handle the motivating case, so it seems reasonable to extend this cautiously in subsequent commits?

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:107
+        // we're done.
+        if (PotentialSpecSet.count(OpI))
+          continue;
----------------
davidxl wrote:
> why not checking this first?
Sorry, this order came from when we didn't update the set as aggressively. Yeah, we should check this first.

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:112
+        // If when we directly test whether this is safe it fails, bail.
+        if (UnsafeSet.count(OpI) || mayBeMemoryDependent(*OpI)) {
+          DEBUG(dbgs() << "  Unsafe: can't speculate transitive use: " << *OpI
----------------
davidxl wrote:
> What is the second check about? OpI is not a code motion candidate.
> 
> The first check seems wrong to me too.  At this point, we determined that OpI is not in a block that dominates the the target blocks of the code motion (hoisting), so it should return false immediately.  
> 
> I don't see the need to do a DFS walk to check the chains of operands either. 
> 
> The caching here seems also wrong and confusing.
If `OpI` isn't in a block that dominates the target blocks, we will need to hoist `OpI` as well as the original `UI`. That's the point of this walk and the discussion w/ Danny: we may need to hoist more than one instruction in order to hoist the user of the PHI.

This actually was necessary to get basic tests to work at all because in many cases the operand is a trivial/free instruction such as a trunc or zext between i64 and i32. We want to hoist those in addition to the PHI use. There is a test case that demonstrates this as well I think.

Because we're doing that, we need to actually recurse in some way. DFS vs. BFS is just a tradeoff in terms of how you visit things and my previous emails discussed why I think DFS is probably right.

Hopefully this also makes the caching more clear, but please let me know what is unclear/confusing if note.

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:185
+
+    MatCost += TTI.getIntImmCost(IncomingC->getValue(), IncomingC->getType());
+    ++IncomingConstants[IncomingC];
----------------
davidxl wrote:
> This cost should be weighted by the block frequency of the constant materializing block.
I'm not sure.

Currently, the heuristic is to only do this when it is a strict win in terms of cost (including size). Given that, I don't see much to do with BFI.

We could potentially go with something more aggressive and that would definitely need BFI to do accurately. But I'm not sure what the right heuristics would be there. The only places I've seen this really matter were trivially profitable. So I'm inclined to start with the simple model. We can always add more smarts to the cost model when test cases motivating it arrive.

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:186
+    MatCost += TTI.getIntImmCost(IncomingC->getValue(), IncomingC->getType());
+    ++IncomingConstants[IncomingC];
+  }
----------------
davidxl wrote:
> This should remember the total frequency for incomingC
(I assume this is w.r.t. using BFI, so will defer unless we add that logic...)

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:256
+
+  DEBUG(dbgs() << "    Cost savings " << (MatCost - FoldedImmCost) << ": " << PN
+               << "\n");
----------------
davidxl wrote:
> The cost analysis here assumes non-speculative case where the uses of phi  are fully anticipated. Consider the real speculative case:
> 
> 
> ```
> 
> /// entry:
> ///     br i1 %flag, label %a, label %b
> ///
> ///   a:
> ///     br label %merge
> ///
> ///   b:
> ///     br label %merge
> ///
> ///   merge:
> ///     %p = phi i32 [ 7, %a ], [ 11, %b ]
> ///     br i1 %flag2, label %use, label %exit
> ///
> ///   use:
> ///     %sum = add i32 %arg, %p
> ///     br label %exit
> ///
> ///   exit:
> ///    %s = phi ([0, ...], [%sum, ..])
> ///     ret i32 %s
> 
> ```
> Hoisting the add into %a and %b is speculative. Additional speculation cost is involved depending on the cost of add and relative frequency of %a+%b vs %use.
> 
> 
> consider another case with triagular cfg:
> 
> ```
> 
> /// entry:
> ///     br i1 %flag, label %a, label %merge
> ///   a:
> ///     br label %merge
> ///
> ///   merge:
> ///     %p = phi i32 [ 7, %a ], [ 11, %entry ]
> ///     %sum = add i32 %arg, %p
> ///    ret i32 %sum
> 
> ```
> 
> hoisting 'add' into entry and '%a' introduces runtime overhead.
Ah, this explains perhaps the comment above regarding BFI.

The second case was definitely unintended. I can add something to prevent this. I'm not sure we *ever* want to do that case.

For the first case, I'm not so sure. If the 'add' is the same cost as materializing the constant, and we'll have to materialize the constant anyways, is this really a problem? We can suppress this too if so.

I'm somewhat interested in simpler cost modeling and just eliminating the hard cases here if possible (IE, until we see places where the fancier approach is needed).

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:684
+                            [&](PHINode *PN) {
+                              return !isSafeAndProfitableToSpeculateAroundPHI(
+                                  *PN, CostSavingsMap, PotentialSpecSet,
----------------
davidxl wrote:
> Looks like the PotentialSpecSet, UnsafeSet is shared across different PN candidates -- this does not look correct.
Only for PNs in the same basic block, which should make all of the things the same and correct to cache?

I can add an assert w.r.t. the PNs being in the same block though.

https://reviews.llvm.org/D37467