[PATCH] D61610: [PPC64] implement Thunk Section Spacing

Peter Smith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 8 01:59:36 PDT 2019


peter.smith added a comment.

In D61610#1494525 <https://reviews.llvm.org/D61610#1494525>, @MaskRay wrote:

> I was trying to find a good value used in 
>  `uint32_t PPC64::getThunkSectionSpacing() const { return 0x8000 - 0x1000; }` before I noticed this caused an extreme slowdown in thunk creation.
>
> We have a large program whose text segment is of 1.3GiB. Its OutputSection `.text` consists of 1948775 InputSections. It takes 20 seconds to link but with this patch it doesn't stop in 10 minutes.
>
> The reason is that:
>
>   // O(|ThunkVec|). Not the critical path, but there is a bottleneck, too. Many ThunkVec's have hundreds/thousands of elements
>               std::tie(T, IsNew) = getThunk(*Rel.Sym, Rel.Type, Src); 
>  
>               if (IsNew) { // called 112353 times
>                 // Find or create a ThunkSection for the new Thunk
>                 ThunkSection *TS;
>                 if (auto *TIS = T->getTargetInputSection())
>                   TS = getISThunkSec(TIS);
>                 else
>                   TS = getISDThunkSec(OS, IS, ISD, Rel.Type, Src); // it takes O(|ISD->ThunkSections|) time, `ISD->ThunkSections` is large (25980) when getThunkSectionSpacing is enabled.
>                 TS->addThunk(T);
>                 Thunks[T->getThunkTargetSym()] = T;
>               }
>


Looking at the comment:

  //      - b[l,a]  PPC64_REL24 range [33,554,432...33,554,428]
  //      - bc[l,a] PPC64_REL14 range [-32,768...32764]
  // We take the most strict range and intentionally use a lower
  // size than the maximum branch range so the end of the ThunkSection
  // is more likely to be within range of the branch instruction that 
  // is furthest away.

I strongly recommend taking the longer range (32 Mb) for the ThunkSection spacing. If PPC is anything like Arm, I would be very surprised to see a lot of inter-section conditional branches, and with a range that low the chances of finding a reuse spot within 32k seems extremely low. In Arm we use the longest range, most common Thumb2 branch range of 16Mb to place the Thunk Sections and rely on the tidy up part in // No suitable ThunkSection exists. for the shorter-range 16Mb sections. I think that strategy could be used for PPC and would have roughly 40 ThunkSections in a 1.3Gb text section.

If there are many 32k conditional branches in the program then I think the original strategy of not using the ThunkSectionSpacing is the right way to go, there just won't be enough reuse opportunity to make it worthwhile.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61610/new/

https://reviews.llvm.org/D61610





More information about the llvm-commits mailing list