[PATCH] D37467: Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable.

Fri Oct 6 10:38:20 PDT 2017

davidxl added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:67
+    // instructions and allow handling stores in easily proven safe cases.
+    if (mayBeMemoryDependent(*UI)) {
+      DEBUG(dbgs() << "  Unsafe: can't speculate use: " << *UI << "\n");
----------------
chandlerc wrote:
> davidxl wrote:
> > This is too limiting.  The scenario is commonly created by the store sinking optimization -- which involves memory operation.
> There is a FIXME discussing this already?
> 
> However, this pass is already able to handle the motivating case, so it seems reasonable to extend this cautiously in subsequent commits?
This might affect the design decisions: use MemSSA or AS ?

It is probably good to handle the easiest cases (which covers 90% of the cases) where the memory instruction is just hoisted to its direct predecessor blocks.

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:112
+        // If when we directly test whether this is safe it fails, bail.
+        if (UnsafeSet.count(OpI) || mayBeMemoryDependent(*OpI)) {
+          DEBUG(dbgs() << "  Unsafe: can't speculate transitive use: " << *OpI
----------------
chandlerc wrote:
> davidxl wrote:
> > What is the second check about? OpI is not a code motion candidate.
> > 
> > The first check seems wrong to me too.  At this point, we determined that OpI is not in a block that dominates the the target blocks of the code motion (hoisting), so it should return false immediately.  
> > 
> > I don't see the need to do a DFS walk to check the chains of operands either. 
> > 
> > The caching here seems also wrong and confusing.
> If `OpI` isn't in a block that dominates the target blocks, we will need to hoist `OpI` as well as the original `UI`. That's the point of this walk and the discussion w/ Danny: we may need to hoist more than one instruction in order to hoist the user of the PHI.
> 
> This actually was necessary to get basic tests to work at all because in many cases the operand is a trivial/free instruction such as a trunc or zext between i64 and i32. We want to hoist those in addition to the PHI use. There is a test case that demonstrates this as well I think.
> 
> Because we're doing that, we need to actually recurse in some way. DFS vs. BFS is just a tradeoff in terms of how you visit things and my previous emails discussed why I think DFS is probably right.
> 
> Hopefully this also makes the caching more clear, but please let me know what is unclear/confusing if note.

So dependent OpI are also hoisting candidates. However I don't see the cost model actually consider them.

Besides, speculative and non-speculative hoisting candidates are all handled here. For real speculative ones, there is not enough check for safe speculations -- e.g. instructions that may trap or result in exceptions.

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:117
+          // so we prune subsequent searches.
+          UnsafeSet.insert(OpI);
+          for (auto &StackPair : DFSStack) {
----------------
What is the point of tracking UnsafeSet?

The code will be much more readable if the following is done:

1) collect the set of all operands (transitively) UI depends on, which are not defined in a dominating block of the hoisting target blocks;
2) check if it safe to hoist those dependent operands. If any one of them is not safe, bail.

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:119
+          for (auto &StackPair : DFSStack) {
+            Instruction *I = StackPair.first;
+            UnsafeSet.insert(I);
----------------
Why not pop the DFSstack?

================
Comment at: llvm/lib/Transforms/Scalar/SpeculateAroundPHIs.cpp:256
+
+  DEBUG(dbgs() << "    Cost savings " << (MatCost - FoldedImmCost) << ": " << PN
+               << "\n");
----------------
chandlerc wrote:
> davidxl wrote:
> > The cost analysis here assumes non-speculative case where the uses of phi  are fully anticipated. Consider the real speculative case:
> > 
> > 
> > ```
> > 
> > /// entry:
> > ///     br i1 %flag, label %a, label %b
> > ///
> > ///   a:
> > ///     br label %merge
> > ///
> > ///   b:
> > ///     br label %merge
> > ///
> > ///   merge:
> > ///     %p = phi i32 [ 7, %a ], [ 11, %b ]
> > ///     br i1 %flag2, label %use, label %exit
> > ///
> > ///   use:
> > ///     %sum = add i32 %arg, %p
> > ///     br label %exit
> > ///
> > ///   exit:
> > ///    %s = phi ([0, ...], [%sum, ..])
> > ///     ret i32 %s
> > 
> > ```
> > Hoisting the add into %a and %b is speculative. Additional speculation cost is involved depending on the cost of add and relative frequency of %a+%b vs %use.
> > 
> > 
> > consider another case with triagular cfg:
> > 
> > ```
> > 
> > /// entry:
> > ///     br i1 %flag, label %a, label %merge
> > ///   a:
> > ///     br label %merge
> > ///
> > ///   merge:
> > ///     %p = phi i32 [ 7, %a ], [ 11, %entry ]
> > ///     %sum = add i32 %arg, %p
> > ///    ret i32 %sum
> > 
> > ```
> > 
> > hoisting 'add' into entry and '%a' introduces runtime overhead.
> Ah, this explains perhaps the comment above regarding BFI.
> 
> The second case was definitely unintended. I can add something to prevent this. I'm not sure we *ever* want to do that case.
> 
> For the first case, I'm not so sure. If the 'add' is the same cost as materializing the constant, and we'll have to materialize the constant anyways, is this really a problem? We can suppress this too if so.
> 
> I'm somewhat interested in simpler cost modeling and just eliminating the hard cases here if possible (IE, until we see places where the fancier approach is needed).
As an example for the first case. Suppose there are two incoming constants C1, and C2.     M(C1) and M(C2) are the cost of materializing the constants, and F(C1) and F(C2) are the costs of folding them into UI.   Assuming the frequency of two incoming blocks are P1 and P2.

The total cost before will be:   P1 * M(C1) + P2 * M (C2), and the total cost after is P1 * F(C1) + P2 * F(C2).

The comparison can be reduced to comparing  M(C1) + M(C2) with F(C1) + F(C2) only when
1) P1 == P2
or
2) M(C1) == M(C2) and F(C1) == F(C2)

Depending on the value of C1 and C2, 2) may not be true.

https://reviews.llvm.org/D37467