[PATCH] D81765: Don't inline dynamic allocas that simplify to huge static allocas.

Tue Jun 23 18:21:30 PDT 2020

aemerson marked 4 inline comments as done.
aemerson added a comment.

In D81765#2109656 <https://reviews.llvm.org/D81765#2109656>, @jfb wrote:

> In D81765#2090952 <https://reviews.llvm.org/D81765#2090952>, @jfb wrote:
>
> > One thing I'd like to make sure we don't break:
> >
> >   __attribute__((always_inline))
> >   char *stack_allocate(size_t size) {
> >     if (size < threshold)
> >       return alloca(size);
> >     return malloc(size);
> >   }
> >
> >
> > This should always inline. Is it still the case? It turns out that we have Important Code which relies on this...
>
>
> Can you check this?

Sure, I'll add this as a test.

================
Comment at: llvm/include/llvm/Analysis/InlineCost.h:54
+/// static allocas above this amount in bytes.
+const uint64_t MaxSimplifiedDynamicAllocaToMove = 65536;
 } // namespace InlineConstants
----------------
efriedma wrote:
> jfb wrote:
> > mtrofin wrote:
> > > I think MaxSimplifiedDynamicAllocaToMove should be part of InlineParams, since it's a threshold. 
> > It probably needs to be a per-target thing, since we want different values for userspace and kernel.
> Rather than add a threshold for a single dynamic alloca, it might make sense to just pay attention to the overall size of the stack of the inlined function.  We currently have a threshold TotalAllocaSizeRecursiveCaller which applies to recursive inlining; we could apply a similar threshold to inlining in general.  That would avoid potential weird interactions with other passes about whether the alloca is actually dynamic.
> 
> It might make sense to let the user control that threshold; users running code on small stacks might care more about the stack size.  But really, I'm not sure there's a great way to solve "how much stack memory should we speculatively allocate" if we can't see the whole callstack.
> 
> But really, I'm fine with this patch as-is as a spot fix for "I accidentally allocated 2^34 bytes in dead code". We probably want to consider the general issue of stack usage more carefully.
> It probably needs to be a per-target thing, since we want different values for userspace and kernel.
For this specific bug any reasonable value is going to work to catch a -1 size allocation. Like Eli says I feel generally controlling the stack allocation size is a much broader set of changes than this one.

================
Comment at: llvm/lib/Analysis/InlineCost.cpp:838
   if (I.isArrayAllocation()) {
     Constant *Size = SimplifiedValues.lookup(I.getArraySize());
----------------
efriedma wrote:
> aemerson wrote:
> > efriedma wrote:
> > > Hhow is the handling for array allocation supposed to work if the alloca isn't in the entry block of the function?  It looks like this code assumes that constant-propagation is enough to turn any dynamic alloca into a static alloca?
> > > 
> > > Not really a problem with this change, exactly, but I want to make sure I'm understanding the surrounding code correctly.
> > I'm not entirely sure what you mean, but if an array allocation isn't in the entry block, and if constant prop can't simplify it to a static alloca, then it remains a dynamic alloca and the inliner gives up (see the end of this function).
> My question is what happens if an array allocation isn't in the entry block, but we can simplify the size to a constant.
That was the original motivation for this fix actually. If its simplified to a constant, then it can be converted to a static alloca and then later moved into the entry block. If that constant allocation is too large then we can get overflows.

================
Comment at: llvm/lib/Analysis/InlineCost.cpp:845
+      // a threshold.
+      if (AllocSize->getLimitedValue() >
+          InlineConstants::MaxSimplifiedDynamicAllocaToMove) {
----------------
mtrofin wrote:
> I think that the decision here needs to be mindful of profiling information (if present). The hypothesis in the patch is that the call site may never be executed. What if we know (from profiling) it is always executed?
> 
> Also, could you provide more insight into the scenario generating this - thanks!
> What if we know (from profiling) it is always executed?
IMO then that function is simply allocating a huge amount on the stack. I see that in a similar way to TotalAllocaSizeRecursiveCaller being triggered, in that case we don't want to inline either, no matter what profiling information says, right?

> Also, could you provide more insight into the scenario generating this - thanks!
This is code generated as a result of the `-ftrivial-auto-var-init=pattern` option, which combined with user code that assumes that VLAs will not be speculatively created with -1 (2^32) size in the entry block. If you run the test case in this patch you will see the resulting alloca which is guaranteed to crash on any reasonable platform.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81765/new/

https://reviews.llvm.org/D81765