[PATCH] D81765: Don't inline dynamic allocas that simplify to huge static allocas.

Wed Jun 24 12:28:39 PDT 2020

mtrofin added a comment.

Not sure I follow this in the patch description:
"Avoid inlining functions if this would result, as even if we didn't hoist the alloca it would remain an dynamic alloca in the caller body."

What would be different between: inlining and not hoisting the alloca, or not inlining?

================
Comment at: llvm/lib/Analysis/InlineCost.cpp:845
+      // a threshold.
+      if (AllocSize->getLimitedValue() >
+          InlineConstants::MaxSimplifiedDynamicAllocaToMove) {
----------------
aemerson wrote:
> mtrofin wrote:
> > jfb wrote:
> > > mtrofin wrote:
> > > > aemerson wrote:
> > > > > mtrofin wrote:
> > > > > > I think that the decision here needs to be mindful of profiling information (if present). The hypothesis in the patch is that the call site may never be executed. What if we know (from profiling) it is always executed?
> > > > > > 
> > > > > > Also, could you provide more insight into the scenario generating this - thanks!
> > > > > > What if we know (from profiling) it is always executed?
> > > > > IMO then that function is simply allocating a huge amount on the stack. I see that in a similar way to TotalAllocaSizeRecursiveCaller being triggered, in that case we don't want to inline either, no matter what profiling information says, right?
> > > > > 
> > > > > > Also, could you provide more insight into the scenario generating this - thanks!
> > > > > This is code generated as a result of the `-ftrivial-auto-var-init=pattern` option, which combined with user code that assumes that VLAs will not be speculatively created with -1 (2^32) size in the entry block. If you run the test case in this patch you will see the resulting alloca which is guaranteed to crash on any reasonable platform.
> > > > If profiling indicates the callsite is hot, then we know that a) there's no runtime problem, and b) we would want to inline. Yes, that can then surface the problem because of the behavior in InlineFunction, my point is to illustrate that it's not clear cut we wouldn't want to inline.
> > > I think profiling is important to take into account, but I think you're wrong on one point: say we have this function:
> > > ```
> > > void f(int a, int b) {
> > >   if (a < 1000)
> > >     g();
> > >   if (b < 1000)
> > >     h();
> > > }
> > > ```
> > > Imagine profiling shows both are hot. Imagine `g` and `h` have very large stacks. We'll now inline them, hoist them to the top of `f`, and in many cases fail to color the `alloca`s separately.
> > > 
> > > What you said is: "profiling shows there's no runtime problem". That's only true without inlining! With inlining, if the optimizer falls on its face, there's a problem.
> > > 
> > > The bug I describe isn't exactly the one @aemerson is fixing, but it's connected, and one @MadCoder and friends run into *very frequently*.
> > > 
> > > If we back up: Amara is trying to fix one instance of a quite unfortunate bug. That won't fix the entire problem with LLVM inlining `alloca` and then making a mess of it. But I hope it helps. Let's focus on avoiding what it might regress, and as a follow-up I think we all agree there's much more to be done.
> > Yup - inlining would "surface the problem because of the behavior in InlineFunction" - I think we're very much in agreement!
> > 
> > If this patch can avoid uninteded perf regressions, or offer a way to avoid them, sgtm!
> > 
> > For due diligence, what's the experience today of folks that run into this - refactor code and mark noinline? (Asking to learn, not to minimize what's very likely quite a negative experience) 
> At the current threshold of 64k (open to changing that but serves the purpose for this bug fix), this code never fires across a build of the LLVM test suite for arm64 at -Os + LTO, which is expected.
> 
> I also tried to move the constant to InlineParams but I'm not familiar enough with the inliner to see how I could access InlineParams from within the CallAnalyzer.
> At the current threshold of 64k (open to changing that but serves the purpose for this bug fix), this code never fires across a build of the LLVM test suite for arm64 at -Os + LTO, which is expected.

I'm not sure that would catch production cases with profiling information - on the other hand, of course 64K seems large enough.

Could you comment on my last question above  "what's the experience today... etc" - thanks!

> I also tried to move the constant to InlineParams but I'm not familiar enough with the inliner to see how I could access InlineParams from within the CallAnalyzer.

Probably fine - @jfb had a comment on this probably being more of a per-target thing anyway

================
Comment at: llvm/lib/Analysis/InlineCost.cpp:849
           AllocatedSize);
+      if (AllocatedSize > InlineConstants::MaxSimplifiedDynamicAllocaToInline) {
+        HasDynamicAlloca = true;
----------------
I'd add here a FIXME, detailing what the underlying problem is, and that this is a stop-gap measure which may potentially cause unintended perf regressions - this context would help with maintainability.

For good measure, I think a FIXME in InlineFunction detailing the problem.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81765/new/

https://reviews.llvm.org/D81765