[PATCH] D52915: [PM] keeping history when original SCC split and then merge into itself in the same round of SCC update .

Wed Oct 10 21:48:13 PDT 2018

wmi added a comment.

Thanks for the review.

================
Comment at: lib/Transforms/IPO/Inliner.cpp:1164-1171
+    // It is possible that even if no new SCC is generated (i.e., C == OldC),
+    // the original SCC could be split and then merged into the same one as
+    // itself. During this process, the original SCC will be added into
+    // UR.CWorklist again, we want to cache such case too.
+    //
+    // So if only split has ever happen, the size of UR.CWorklist will be
+    // larger than that before the SCC update, and we will cache the history
----------------
chandlerc wrote:
> Using the size seems a bit odd here -- what about instead checking whether `C != OldC` or if `C` is in `CWorklist`? This is efficient with the given data structure and seems like it captures the intent of what you're checking.
That is clearer to check if OldC is inside of UR.CWorklist. Done.

================
Comment at: test/Transforms/Inline/cgscc-cycle-2.ll:1-2
+; This test contains another tricky call graph structures for the inliner to
+; handle correctly. The callgraph is like following:
+;
----------------
chandlerc wrote:
> We have multiple tests in `cgscc-cycle.ll` already -- why not add this one there?
> 
> I suspect all of those will work fine with whatever inlining threshold makes it easy to exercise.
Ok. I move the test into cgscc-cycle.ll

================
Comment at: test/Transforms/Inline/cgscc-cycle-2.ll:24-26
+; edge, and the two SCCs will be merged into the original one again. During
+; this cycle, the original SCC will be added into UR.CWorklist again and
+; this creates an infinite loop.
----------------
chandlerc wrote:
> FWIW, this description and ascii art are *awesome*. Thanks for that.
> 
> The only high level question I have here is: *should* we have put the SCC onto the worklist?
> 
> One fix (which you have here) is to detect when we will revisit the SCC and retain information to break any cycles in inlining.
> 
> Another fix in theory is to avoid putting the SCC onto the queue.
> 
> What I haven't yet figured out (but you may have already thought about) is if the split+re-merge operations happen in a context where we might expect interesting SCC-structure changes to have occurred and thus it to be profitable to re-enqueue the SCC on the worklist. If that would normally happen, then ythe fix you have here makes perfect sense. But if this is (for example) just an artifact of the order in which we apply updates, and the SCC is never actually going to be different, maybe we should find a way to avoid enqueuing it? It might be too hard to do that though and we'd still end up where your current patch is...
Yeah, Easwaran and I considered the fix to avoid putting the unchanged original SCC into the worklist, but we don't know how to efficiently check when the internal structure of the original SCC is unchanged so we can avoid enqueuing it.  

================
Comment at: test/Transforms/Inline/cgscc-cycle-2.ll:28
+;
+; RUN: opt < %s -passes='cgscc(inline)' -inline-threshold=50 -S
+
----------------
chandlerc wrote:
> I strongly prefer to always include check patterns that verify the inlining we expect to happen does happen (or does not happen). Otherwise some other change may make this entire test irrelevant and we would never know.
Ok, check added.

================
Comment at: test/Transforms/Inline/cgscc-cycle-2.ll:100-102
+attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #1 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #2 = { nounwind }
----------------
chandlerc wrote:
> Please always minimize the attributes for test cases.
> 
> Similarly, use unmangled symbol names, avoid dso_local, and avoid unnecessary `tail`, `align` and other noise in the test case. The goal is to make the IR of the test case as minimal as possible while still exercising the behavior and being easy to read and modify.
Ok, done.

Repository:
  rL LLVM

https://reviews.llvm.org/D52915