[PATCH] D81765: Don't inline dynamic allocas that simplify to huge static allocas.

Wed Jun 24 13:34:40 PDT 2020

aemerson marked an inline comment as done.
aemerson added a comment.

In D81765#2112281 <https://reviews.llvm.org/D81765#2112281>, @mtrofin wrote:

> Not sure I follow this in the patch description:
>  "Avoid inlining functions if this would result, as even if we didn't hoist the alloca it would remain an dynamic alloca in the caller body."
>
> What would be different between: inlining and not hoisting the alloca, or not inlining?

That's a poor description on my part, you can disregard it. If we inlined it but didn't hoist the alloca from the never-executed block, it would still cause the function to have a dynamic alloca present and perhaps negatively impact codegen as Eli alluded to earlier in this thread.

> Could you comment on my last question above "what's the experience today... etc" - thanks!

Sorry I forgot to address that. The experience today is that this bug causes an out of stack fault due to trying to allocate 4GB of stack. The workaround is to either disable -fauto-var-init or to change the code such that the optimizer isn't able to SCCP the alloca size to the callee, and therefore prevent the dynamic alloca from being promoted to a static one. E.g. try to add an early exit somewhere so that the optimizer can prove that the alloc amount must always be > 0.

Also I just noticed one of the new tests is missing some CHECK lines, I'll add those before committing.

================
Comment at: llvm/lib/Analysis/InlineCost.cpp:849
           AllocatedSize);
+      if (AllocatedSize > InlineConstants::MaxSimplifiedDynamicAllocaToInline) {
+        HasDynamicAlloca = true;
----------------
mtrofin wrote:
> I'd add here a FIXME, detailing what the underlying problem is, and that this is a stop-gap measure which may potentially cause unintended perf regressions - this context would help with maintainability.
> 
> For good measure, I think a FIXME in InlineFunction detailing the problem.
I don't really have a good example of how this could cause perf regressions.

Replying to an earlier point:
>What if we know (from profiling) it is always executed?
Correct me if I'm wrong, but how could a function which blows the stack be always executed? The threshold of 64k was intended to only catch cases which would clearly be crashing if executed. I could bump that up even further 1MB or something to be "safer" from a perf perspective.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81765/new/

https://reviews.llvm.org/D81765