[llvm] 831bfce - [Transforms][GlobalSRA] huge array causes long compilation time and huge memory usage.

Philip Reames via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 6 08:32:29 PST 2020


If all the offsets are known, can't we simply emit global variables only 
for the used offsets?  Seems like that would be more robust and equally 
fast if your comment about the source of the slowdown is correct.  (I 
haven't studied the original code and am going purely off the commit 
message.)

Also, 16 elements is a very small definition of "large".  I really don't 
think that's appropriate.

Philip

On 1/4/20 5:42 AM, Alexey Lapshin via llvm-commits wrote:
> Author: Alexey Lapshin
> Date: 2020-01-04T16:42:38+03:00
> New Revision: 831bfcea47826a102ece03f0fad33ce39a73c672
>
> URL: https://github.com/llvm/llvm-project/commit/831bfcea47826a102ece03f0fad33ce39a73c672
> DIFF: https://github.com/llvm/llvm-project/commit/831bfcea47826a102ece03f0fad33ce39a73c672.diff
>
> LOG: [Transforms][GlobalSRA] huge array causes long compilation time and huge memory usage.
>
> Summary:
> For artificial cases (huge array, few usages), Global SRA optimization creates
> a lot of redundant data. It creates an instance of GlobalVariable for each array
> element. For huge array, that means huge compilation time and huge memory usage.
> Following example compiles for 10 minutes and requires 40GB of memory.
>
> namespace {
>    char LargeBuffer[64 * 1024 * 1024];
> }
>
> int main ( void ) {
>
>      LargeBuffer[0] = 0;
>
>      printf("\n ");
>
>      return LargeBuffer[0] == 0;
> }
>
> The fix is to avoid Global SRA for large arrays.
>
> Reviewers: craig.topper, rnk, efriedma, fhahn
>
> Reviewed By: rnk
>
> Subscribers: xbolva00, lebedev.ri, lkail, merge_guards_bot, hiraditya, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D71993
>
> Added:
>      llvm/test/Transforms/GlobalOpt/long-compilation-global-sra.ll
>
> Modified:
>      llvm/lib/Transforms/IPO/GlobalOpt.cpp
>
> Removed:
>      
>
>
> ################################################################################
> diff  --git a/llvm/lib/Transforms/IPO/GlobalOpt.cpp b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
> index 0d1b7da96c83..beb3785eda31 100644
> --- a/llvm/lib/Transforms/IPO/GlobalOpt.cpp
> +++ b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
> @@ -433,6 +433,20 @@ static bool GlobalUsersSafeToSRA(GlobalValue *GV) {
>     return true;
>   }
>   
> +static bool CanDoGlobalSRA(GlobalVariable *GV) {
> +  Constant *Init = GV->getInitializer();
> +
> +  if (isa<StructType>(Init->getType())) {
> +    // nothing to check
> +  } else if (SequentialType *STy = dyn_cast<SequentialType>(Init->getType())) {
> +    if (STy->getNumElements() > 16 && GV->hasNUsesOrMore(16))
> +      return false; // It's not worth it.
> +  } else
> +    return false;
> +
> +  return GlobalUsersSafeToSRA(GV);
> +}
> +
>   /// Copy over the debug info for a variable to its SRA replacements.
>   static void transferSRADebugInfo(GlobalVariable *GV, GlobalVariable *NGV,
>                                    uint64_t FragmentOffsetInBits,
> @@ -462,88 +476,94 @@ static void transferSRADebugInfo(GlobalVariable *GV, GlobalVariable *NGV,
>   /// insert so that the caller can reprocess it.
>   static GlobalVariable *SRAGlobal(GlobalVariable *GV, const DataLayout &DL) {
>     // Make sure this global only has simple uses that we can SRA.
> -  if (!GlobalUsersSafeToSRA(GV))
> +  if (!CanDoGlobalSRA(GV))
>       return nullptr;
>   
>     assert(GV->hasLocalLinkage());
>     Constant *Init = GV->getInitializer();
>     Type *Ty = Init->getType();
>   
> -  std::vector<GlobalVariable *> NewGlobals;
> -  Module::GlobalListType &Globals = GV->getParent()->getGlobalList();
> +  std::map<unsigned, GlobalVariable *> NewGlobals;
>   
>     // Get the alignment of the global, either explicit or target-specific.
>     unsigned StartAlignment = GV->getAlignment();
>     if (StartAlignment == 0)
>       StartAlignment = DL.getABITypeAlignment(GV->getType());
>   
> -  if (StructType *STy = dyn_cast<StructType>(Ty)) {
> -    unsigned NumElements = STy->getNumElements();
> -    NewGlobals.reserve(NumElements);
> -    const StructLayout &Layout = *DL.getStructLayout(STy);
> -    for (unsigned i = 0, e = NumElements; i != e; ++i) {
> -      Constant *In = Init->getAggregateElement(i);
> -      assert(In && "Couldn't get element of initializer?");
> -      GlobalVariable *NGV = new GlobalVariable(STy->getElementType(i), false,
> -                                               GlobalVariable::InternalLinkage,
> -                                               In, GV->getName()+"."+Twine(i),
> -                                               GV->getThreadLocalMode(),
> -                                              GV->getType()->getAddressSpace());
> -      NGV->setExternallyInitialized(GV->isExternallyInitialized());
> -      NGV->copyAttributesFrom(GV);
> -      Globals.push_back(NGV);
> -      NewGlobals.push_back(NGV);
> +  // Loop over all users and create replacement variables for used aggregate
> +  // elements.
> +  for (User *GEP : GV->users()) {
> +    assert(((isa<ConstantExpr>(GEP) && cast<ConstantExpr>(GEP)->getOpcode() ==
> +                                           Instruction::GetElementPtr) ||
> +            isa<GetElementPtrInst>(GEP)) &&
> +           "NonGEP CE's are not SRAable!");
> +
> +    // Ignore the 1th operand, which has to be zero or else the program is quite
> +    // broken (undefined).  Get the 2nd operand, which is the structure or array
> +    // index.
> +    unsigned ElementIdx = cast<ConstantInt>(GEP->getOperand(2))->getZExtValue();
> +    if (NewGlobals.count(ElementIdx) == 1)
> +      continue; // we`ve already created replacement variable
> +    assert(NewGlobals.count(ElementIdx) == 0);
> +
> +    Type *ElTy = nullptr;
> +    if (StructType *STy = dyn_cast<StructType>(Ty))
> +      ElTy = STy->getElementType(ElementIdx);
> +    else if (SequentialType *STy = dyn_cast<SequentialType>(Ty))
> +      ElTy = STy->getElementType();
> +    assert(ElTy);
> +
> +    Constant *In = Init->getAggregateElement(ElementIdx);
> +    assert(In && "Couldn't get element of initializer?");
> +
> +    GlobalVariable *NGV = new GlobalVariable(
> +        ElTy, false, GlobalVariable::InternalLinkage, In,
> +        GV->getName() + "." + Twine(ElementIdx), GV->getThreadLocalMode(),
> +        GV->getType()->getAddressSpace());
> +    NGV->setExternallyInitialized(GV->isExternallyInitialized());
> +    NGV->copyAttributesFrom(GV);
> +    NewGlobals.insert(std::make_pair(ElementIdx, NGV));
> +
> +    if (StructType *STy = dyn_cast<StructType>(Ty)) {
> +      const StructLayout &Layout = *DL.getStructLayout(STy);
>   
>         // Calculate the known alignment of the field.  If the original aggregate
>         // had 256 byte alignment for example, something might depend on that:
>         // propagate info to each field.
> -      uint64_t FieldOffset = Layout.getElementOffset(i);
> +      uint64_t FieldOffset = Layout.getElementOffset(ElementIdx);
>         Align NewAlign(MinAlign(StartAlignment, FieldOffset));
> -      if (NewAlign > Align(DL.getABITypeAlignment(STy->getElementType(i))))
> +      if (NewAlign >
> +          Align(DL.getABITypeAlignment(STy->getElementType(ElementIdx))))
>           NGV->setAlignment(NewAlign);
>   
>         // Copy over the debug info for the variable.
>         uint64_t Size = DL.getTypeAllocSizeInBits(NGV->getValueType());
> -      uint64_t FragmentOffsetInBits = Layout.getElementOffsetInBits(i);
> -      transferSRADebugInfo(GV, NGV, FragmentOffsetInBits, Size, NumElements);
> -    }
> -  } else if (SequentialType *STy = dyn_cast<SequentialType>(Ty)) {
> -    unsigned NumElements = STy->getNumElements();
> -    if (NumElements > 16 && GV->hasNUsesOrMore(16))
> -      return nullptr; // It's not worth it.
> -    NewGlobals.reserve(NumElements);
> -    auto ElTy = STy->getElementType();
> -    uint64_t EltSize = DL.getTypeAllocSize(ElTy);
> -    Align EltAlign(DL.getABITypeAlignment(ElTy));
> -    uint64_t FragmentSizeInBits = DL.getTypeAllocSizeInBits(ElTy);
> -    for (unsigned i = 0, e = NumElements; i != e; ++i) {
> -      Constant *In = Init->getAggregateElement(i);
> -      assert(In && "Couldn't get element of initializer?");
> -
> -      GlobalVariable *NGV = new GlobalVariable(STy->getElementType(), false,
> -                                               GlobalVariable::InternalLinkage,
> -                                               In, GV->getName()+"."+Twine(i),
> -                                               GV->getThreadLocalMode(),
> -                                              GV->getType()->getAddressSpace());
> -      NGV->setExternallyInitialized(GV->isExternallyInitialized());
> -      NGV->copyAttributesFrom(GV);
> -      Globals.push_back(NGV);
> -      NewGlobals.push_back(NGV);
> +      uint64_t FragmentOffsetInBits = Layout.getElementOffsetInBits(ElementIdx);
> +      transferSRADebugInfo(GV, NGV, FragmentOffsetInBits, Size,
> +                           STy->getNumElements());
> +    } else if (SequentialType *STy = dyn_cast<SequentialType>(Ty)) {
> +      uint64_t EltSize = DL.getTypeAllocSize(ElTy);
> +      Align EltAlign(DL.getABITypeAlignment(ElTy));
> +      uint64_t FragmentSizeInBits = DL.getTypeAllocSizeInBits(ElTy);
>   
>         // Calculate the known alignment of the field.  If the original aggregate
>         // had 256 byte alignment for example, something might depend on that:
>         // propagate info to each field.
> -      Align NewAlign(MinAlign(StartAlignment, EltSize * i));
> +      Align NewAlign(MinAlign(StartAlignment, EltSize * ElementIdx));
>         if (NewAlign > EltAlign)
>           NGV->setAlignment(NewAlign);
> -      transferSRADebugInfo(GV, NGV, FragmentSizeInBits * i, FragmentSizeInBits,
> -                           NumElements);
> +      transferSRADebugInfo(GV, NGV, FragmentSizeInBits * ElementIdx,
> +                           FragmentSizeInBits, STy->getNumElements());
>       }
>     }
>   
>     if (NewGlobals.empty())
>       return nullptr;
>   
> +  Module::GlobalListType &Globals = GV->getParent()->getGlobalList();
> +  for (auto NewGlobalVar : NewGlobals)
> +    Globals.push_back(NewGlobalVar.second);
> +
>     LLVM_DEBUG(dbgs() << "PERFORMING GLOBAL SRA ON: " << *GV << "\n");
>   
>     Constant *NullInt =Constant::getNullValue(Type::getInt32Ty(GV->getContext()));
> @@ -559,11 +579,11 @@ static GlobalVariable *SRAGlobal(GlobalVariable *GV, const DataLayout &DL) {
>       // Ignore the 1th operand, which has to be zero or else the program is quite
>       // broken (undefined).  Get the 2nd operand, which is the structure or array
>       // index.
> -    unsigned Val = cast<ConstantInt>(GEP->getOperand(2))->getZExtValue();
> -    if (Val >= NewGlobals.size()) Val = 0; // Out of bound array access.
> +    unsigned ElementIdx = cast<ConstantInt>(GEP->getOperand(2))->getZExtValue();
> +    assert(NewGlobals.count(ElementIdx) == 1);
>   
> -    Value *NewPtr = NewGlobals[Val];
> -    Type *NewTy = NewGlobals[Val]->getValueType();
> +    Value *NewPtr = NewGlobals[ElementIdx];
> +    Type *NewTy = NewGlobals[ElementIdx]->getValueType();
>   
>       // Form a shorter GEP if needed.
>       if (GEP->getNumOperands() > 3) {
> @@ -581,7 +601,8 @@ static GlobalVariable *SRAGlobal(GlobalVariable *GV, const DataLayout &DL) {
>           for (unsigned i = 3, e = GEPI->getNumOperands(); i != e; ++i)
>             Idxs.push_back(GEPI->getOperand(i));
>           NewPtr = GetElementPtrInst::Create(
> -            NewTy, NewPtr, Idxs, GEPI->getName() + "." + Twine(Val), GEPI);
> +            NewTy, NewPtr, Idxs, GEPI->getName() + "." + Twine(ElementIdx),
> +            GEPI);
>         }
>       }
>       GEP->replaceAllUsesWith(NewPtr);
> @@ -596,17 +617,8 @@ static GlobalVariable *SRAGlobal(GlobalVariable *GV, const DataLayout &DL) {
>     Globals.erase(GV);
>     ++NumSRA;
>   
> -  // Loop over the new globals array deleting any globals that are obviously
> -  // dead.  This can arise due to scalarization of a structure or an array that
> -  // has elements that are dead.
> -  unsigned FirstGlobal = 0;
> -  for (unsigned i = 0, e = NewGlobals.size(); i != e; ++i)
> -    if (NewGlobals[i]->use_empty()) {
> -      Globals.erase(NewGlobals[i]);
> -      if (FirstGlobal == i) ++FirstGlobal;
> -    }
> -
> -  return FirstGlobal != NewGlobals.size() ? NewGlobals[FirstGlobal] : nullptr;
> +  assert(NewGlobals.size() > 0);
> +  return NewGlobals.begin()->second;
>   }
>   
>   /// Return true if all users of the specified value will trap if the value is
>
> diff  --git a/llvm/test/Transforms/GlobalOpt/long-compilation-global-sra.ll b/llvm/test/Transforms/GlobalOpt/long-compilation-global-sra.ll
> new file mode 100644
> index 000000000000..69a358edcadb
> --- /dev/null
> +++ b/llvm/test/Transforms/GlobalOpt/long-compilation-global-sra.ll
> @@ -0,0 +1,61 @@
> +; RUN: opt %s --O0 -globalopt -S -o -
> +
> +; This is a regression test against very slow execution...
> +; In bad case it should fail by timeout.
> +
> +; Hand-reduced from this example.
> +; clang++ -mllvm -disable-llvm-optzns
> +
> +;#include <stdio.h>
> +;
> +;namespace {
> +;  char LargeBuffer[64 * 1024 * 1024];
> +;}
> +;
> +;int main ( void ) {
> +;
> +;    LargeBuffer[0] = 0;
> +;
> +;    printf("");
> +;
> +;    return LargeBuffer[0] == 0;
> +;}
> +
> +; check that global array LargeBufferE was optimized out
> +; and local variable LargeBufferE.0 was used instead.
> +
> +; CHECK-NOT: global
> +; CHECK: main()
> +; CHECK-NEXT: LargeBufferE.0
> +; CHECK-NOT: global
> +
> +; ModuleID = 'test.cpp'
> +source_filename = "test.cpp"
> +target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
> +target triple = "x86_64-unknown-linux-gnu"
> +
> + at LargeBufferE = internal global [67108864 x i8] zeroinitializer, align 16
> + at .str = private unnamed_addr constant [1 x i8] c"\00", align 1
> +
> +; Function Attrs: norecurse uwtable
> +define dso_local i32 @main() #0 {
> +  %1 = alloca i32, align 4
> +  store i32 0, i32* %1, align 4
> +  store i8 0, i8* getelementptr inbounds ([67108864 x i8], [67108864 x i8]* @LargeBufferE, i64 0, i64 0), align 16
> +  %2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @.str, i64 0, i64 0))
> +  %3 = load i8, i8* getelementptr inbounds ([67108864 x i8], [67108864 x i8]* @LargeBufferE, i64 0, i64 0), align 16
> +  %4 = sext i8 %3 to i32
> +  %5 = icmp eq i32 %4, 0
> +  %6 = zext i1 %5 to i32
> +  ret i32 %6
> +}
> +
> +declare dso_local i32 @printf(i8*, ...) #0
> +
> +attributes #0 = { norecurse uwtable }
> +
> +!llvm.module.flags = !{!0}
> +!llvm.ident = !{!1}
> +
> +!0 = !{i32 1, !"wchar_size", i32 4}
> +!1 = !{!"clang version 10.0.0 "}
>
>
>          
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits


More information about the llvm-commits mailing list