[PATCH] D22778: Add Loop Sink pass to reverse the LICM based of basic block frequency.

Chandler Carruth via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 27 00:59:03 PDT 2016


chandlerc accepted this revision.
chandlerc added a reviewer: chandlerc.
chandlerc added a comment.

Very cool. I think this patch LGTM with the comments below addressed (documentation fixes, simple changes, a fixme, a bunch of minor test cleanup). Feel free to submit. I've asked for one somewhat immediate follow-up patch, and it'd also be good to get a patch to put this into the pipeline behind a flag so folks can test the size impact.



================
Comment at: lib/Transforms/Scalar/LoopSink.cpp:13-22
+// This pass is a reverse-transformation of LICM. It differs from the Sink
+// pass in that it only processes instructions in the loop's preheader, and has
+// more accurate alias/profile info to guide sinking decisions.
+//
+// It differs from the Sink pass in a way that:
+//
+// * It only handles sinking of instructions from the loop's prehead to the
----------------
The differences seem to be a bit duplicated at this point. Sorry if this is the result of my suggestions.

I think that you only need one of the prose "It Differs ..." and the bulleted list. If you want the detail in the list, I would just clean up the wording of that section so the English reads cleanly:

"in a way that" -> "in the following ways"
"prehead" -> "preahader"
"find optimal" -> "find the optimal"


================
Comment at: lib/Transforms/Scalar/LoopSink.cpp:84
+///     AdjustedFreq(BBs) = 99 / SinkFrequencyPercentThreshold%
+static BlockFrequency adjustedSumFreq(SmallPtrSet<BasicBlock *, 2> &BBs,
+                                      BlockFrequencyInfo &BFI) {
----------------
I would use `SmallPtrSetImpl<BasicBlock *>` here and elsewhere on API boundaries where you can below.


================
Comment at: lib/Transforms/Scalar/LoopSink.cpp:212
+  NumLoopSunk++;
+  I.moveBefore(&*(*BI)->getFirstInsertionPt());
+
----------------
So, this technically will break the verifier if you ever look at the IR at this point. While that is allowed, it seems fairly easy to avoid this by first creating all the clones and rewriting uses to the clones before moving the instruction. By the time you move it, the only uses remaining should be the ones dominated by the destination insertion point.


================
Comment at: lib/Transforms/Scalar/LoopSink.cpp:220-228
+    // Replaces uses of I with IC in N
+    for (Value::use_iterator UI = I.use_begin(), UE = I.use_end(); UI != UE;) {
+      Use &U = *UI++;
+      auto *I = cast<Instruction>(U.getUser());
+      if (I->getParent() == N)
+        U.set(IC);
+    }
----------------
This causes the cloning to be quadratic in the number of uses as we may have a single clone for each use.

I know we have an upper bound, but this is still pretty slow (the use list is slow to walk).

I'd suggest you fix this in a follow-up patch though (I think it'll be a lot of code and easier to review as a follow-up), just leave a short FIXME saying that this is slow and may be quadratic in the number of uses.

(For the follow-up patch, the approach I'd suggest is that as you build up BBsToInsertInto, you also build up a map from UseBBs to the dominating BB that will be inserted into. Then you can insert into each BBsToInsertInto here without rewriting any uses but building up a map from those BBs to the inserted clones. Finally, you can do a single walk over the uses and for each one look up the inserted BB in the first map and then the inserted clone in the second map and rewrite the use. Or maybe you see a simpler way? This was just the first that came to mind.)


================
Comment at: lib/Transforms/Scalar/LoopSink.cpp:264
+  SmallVector<BasicBlock *, 10> ColdLoopBBs;
+  DenseMap<BasicBlock *, int> LoopBlockNumber;
+  int i = 0;
----------------
Use a SmallDenseMap? Good to dodge allocations for small loops.


================
Comment at: test/Transforms/LICM/loopsink.ll:8
+
+; Function Attrs: norecurse nounwind readonly uwtable
+;     b1
----------------
You can prune out these "Function Attrs" comments... See below.


================
Comment at: test/Transforms/LICM/loopsink.ll:292
+
+attributes #0 = { norecurse nounwind readonly uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+
----------------
Please try to minimize function attributes you have in your test cases. You may not need any.

If you do need them, you can attach the textual form directly to the functions which is much more friendly for test cases (and makes the comments explaining what the '#0' attribute set contains unnecessary).


================
Comment at: test/Transforms/LICM/loopsink.ll:294-303
+!llvm.ident = !{!0}
+
+!0 = !{!"clang version 3.9.0 (trunk 268689)"}
+!1 = !{!"branch_weights", i32 1, i32 2000}
+!2 = !{!3, !3, i64 0}
+!3 = !{!"int", !4, i64 0}
+!4 = !{!"omnipotent char", !5, i64 0}
----------------
Please prune out all the metadata your test isn't directly using (TBAA stuff, Clang stuff).


================
Comment at: test/Transforms/LICM/sink.ll:4-5
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
----------------
Unless your tests depend on specific datalayout or triple, please avoid including them in the IR test cases so that things are more generic and less tied to platforms.


================
Comment at: test/Transforms/LICM/sink.ll:22
+; Function Attrs: norecurse nounwind readonly uwtable
+define i32 @_Z3fooii(i32, i32) #0 {
+  %3 = icmp eq i32 %1, 0
----------------
Same comments as above about function attributes. Also, please don't use C++ mangled names, but instead provide clean and easy to read names directly.


https://reviews.llvm.org/D22778





More information about the llvm-commits mailing list