[PATCH] D110658: [InlineAdvisor] Add -inline-replay-strict to replay inline decisions only in callers that have remarks

Fri Oct 1 00:04:48 PDT 2021

wenlei added a comment.

Thanks for working on this. Though perhaps there's some misunderstanding on what "strict" mode means (sorry I should have clarified more beforehand). In the original inline replay patch (https://reviews.llvm.org/D83743), I mentioned strict/positive/negative mode.

The only mode we have before and after this patch is the strict mode. "strict" means existing remarks indicate positive decision, and the rest are all considered negative decision. "positive" means existing remarks indicate positive decision, but the rest is left to other heuristics (or fall back advisors). Accordingly, negative means the input only specifies negative decision.

The change is limiting the scope of replay to function, as opposed to module or whole program. A powerful inline replay framework would support scope={module|function} x mode={strict|positive|negative}.

So I would suggest adding a flag `-inline-replay-scope=[function|module]` (with a default value), instead of `-inline-replay-strict`. Later on if we add positive/negative mode, we could add flag `-inline-replay-mode=[strict|positive|negative]`, but for now we don't need it. Thoughts?

================
Comment at: llvm/include/llvm/Analysis/ReplayInlineAdvisor.h:43
+  StringMap<bool> InlineSitesFromRemarks;
+  StringSet<> InlineCallersFromRemarks;
 };
----------------
nit: `InlineCallersFromRemarks` -> `InlinerFromRemarks`

================
Comment at: llvm/lib/Analysis/InlineAdvisor.cpp:50

+cl::opt<bool> InlineReplayStrict(
+    "inline-replay-strict", cl::init(false),
----------------
Sample loader replay was independent of cgscc replay as they use different switches for taking different inputs. Now the "mode" switch is global, so both replays are tied to a single mode even if they use different input files. I think it's more flexible if they are still kept independent (the hot/cold thresholds etc are all kept separate for the two inliners). 

================
Comment at: llvm/lib/Analysis/ReplayInlineAdvisor.cpp:47

-    StringRef Callee = Pair.first.split(" inlined into")
+    StringRef Callee = Pair.first.split(" inlined into ")
                            .first.rsplit(": '")
----------------
nits:
 - `Pair.first.split(" inlined into ")` is duplicated string search and can be merged.
 - If you do `Pair.first.split("' inlined into '")`, the `drop_back()` and `drop_front()` can be removed, and it helps readability..

================
Comment at: llvm/lib/Analysis/ReplayInlineAdvisor.cpp:53
+                           .first.drop_front();
     auto CallSite = Pair.second.split(";").first;

----------------
`Pair.second.drop_back()` works? or you expect something after `;`?

================
Comment at: llvm/lib/Analysis/ReplayInlineAdvisor.cpp:55

     if (Callee.empty() || CallSite.empty())
       continue;
----------------
Check `Caller.epmty()` too?

I would also add a warning if any of the three is empty - invalid remarks, can't be replayed etc.. instead of silently swallow it.

================
Comment at: llvm/lib/Analysis/ReplayInlineAdvisor.cpp:69-72
+    for (const auto &Remark : InlineSitesFromRemarks)
+      if (!Remark.second)
+        LLVM_DEBUG(dbgs() << "Inline Replay Strict: Did not apply "
+                          << Remark.first() << "\n");
----------------
I'm not sure whether printing such debug dumps from dtor is a good idea. Lifetime of objects could potentially be somewhat arbitrary, but logs better be printed in more controlled/organized fashion. Looking at the test case, it also seems like these prints are added for testing purpose? See the comments in the test case - we better avoid using debug prints for testing. 

================
Comment at: llvm/lib/Analysis/ReplayInlineAdvisor.cpp:81-82

-  if (InlineSitesFromRemarks.empty())
+  if (!Strict && InlineSitesFromRemarks.empty())
     return std::make_unique<DefaultInlineAdvice>(this, CB, None, ORE,
                                                  EmitRemarks);
----------------
Perhaps we can remove this for simplicity. The shortcut isn't helping much.

================
Comment at: llvm/lib/Analysis/ReplayInlineAdvisor.cpp:99
+
+  if (Strict) {
+    if (InlineCallersFromRemarks.count(CB.getFunction()->getName()))
----------------
nit:

```
if (!Strict || InlineCallersFromRemarks.count(CB.getFunction()->getName())) {
  ...
}
else if (Strict) {
  if (OriginalAdvisor)
    return OriginalAdvisor->getAdvice(CB);
  return {};
}
```

This way we don't need a lambda - since it's only called once it can be inlined.

================
Comment at: llvm/lib/Transforms/IPO/Inliner.cpp:835
       // Check whether we want to inline this callsite.
-      if (!Advice->isInliningRecommended()) {
+      if (!Advice || !Advice->isInliningRecommended()) {
         Advice->recordUnattemptedInlining();
----------------
mtrofin wrote:
> modimo wrote:
> > mtrofin wrote:
> > > modimo wrote:
> > > > modimo wrote:
> > > > > mtrofin wrote:
> > > > > > modimo wrote:
> > > > > > > Since getAdvice returns a unique_ptr, returning a null seems like a good way to indicate "no advice". Does that also make sense for the ML inliner @mtrofin?
> > > > > > The design intent was for the advice to be clear, i.e. either inline or not. You probably want to delegate the decision to some other advisor if you don't have one? i.e. the remarks advisor could delegate to the default one - and return a non-null Advice. WDYT?
> > > > > It does delegate it in the CGSCC case because we have a nicely nested system with the default one taking over. 
> > > > > 
> > > > > Unfortunately the SampleProfile inliner currently isn't using the InlineAdvisor setup. That being said, perhaps the better approach is to adapt the SampleProfile inliner to an InlineAdvisor so the replay advisor will always return a non-null.
> > > > The SampleProfile inliner needs additional information beyond `CallBase` to make its decision. Namely, `InlineCandidate` which contains sampling information. Does it make sense to extend `getAdvice` to take additional information?
> > > I don't follow why, but to the scope of this patch, couldn't the ReplayInlineAdvisor return an Advice that says "no" when it has no advice to provide?
> > In SampleProfile, it needs 3 states:
> > 1. Yes when it matches replay
> > 2. No when it doesn't match replay
> > 3. HasNoAdvice when in strict mode and we want the SampleProfile inliner to make the decision
> > 
> > ATM there's only (1) and (2), I'm using null to represent (3). An alternative could be to have a HasNoAdvice state that maps to no inline but can be queried to differentiate between (2) and (3)
> I think I see now. OK, let's allow a null return then from `getAdvice`; to answer your original question, I don't think it affects advisors that want to be categorical (i.e. an advisor can always return yes/no), but it affects consumers, and your patch handles that. The ML stuff doesn't consume, it just implements the advisor interfaces.
> 
> We can explore afterwards if/how to adapt SampleProfile to an advisor design; we'd definitely not want to introduce that `InlineCandidate` in the advisor interface, the former is too specific to the sample profiler.
> 
> Could you leave a comment on the getAdvice() abstract API re. the fact that it can return nothing, a pointer to this review, and a TODO that maybe we can tighten it?
> 
> Thanks!
@mtrofin I added InlineCandidate stuff in sample loader in https://reviews.llvm.org/D94001 and https://reviews.llvm.org/D95024, but the use of extra info for inline decision predates these changes. As to why we can't just use CallBase there - the inlining happens before profile is annotated on IR as branch metadata (it's the sample *loader*), so none of the BPI, BFI stuff is available, hence we have to use the raw context profile representation alongside CallBase to make inline decisions.

I agree that the InlineCandidate there is very specific and not good for advisor API. However I think having three state yes/no/unknown is more flexible, and some advisors can still choose to not use the unknown state. We can use NULL for unknown, or a different representation.

> You probably want to delegate the decision to some other advisor if you don't have one 

Conceptually there's a local "unknown" for some advisor already. Forcing a yes/no decision and not having a representation for unknown makes the inline advisor system "closed" instead of "open" in the sense that everything needs to reach a decision within the advisor hierarchy without any external help. In ideal world, it's clean, and sure we can try to convert everything into the system, but practically it might be too restrictive?     

================
Comment at: llvm/test/Transforms/Inline/cgscc-inline-replay.ll:21
+; REPLAY-STRICT: '_Z3sumii' inlined into 'main' with (cost=always)
+; REPLAY-STRICT: Inline Replay Strict: Did not apply _Z3subii_Z3sumii:1:0 @ main:3:0.1

----------------
This would fail ninja check with release builds. Usually we try to avoid testing that requires debug builds unless it's critical and no other way to observe. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110658/new/

https://reviews.llvm.org/D110658