[compiler-rt] r348335 - [XRay] Move-only Allocator, FunctionCallTrie, and Array

Dean Michael Berris via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 5 04:50:01 PST 2018


Hi Hans,

This looks like a compiler deficiency/bug, but I'm not sure how to
work around this. We actually need the semantics ensured by the
placement-new with brace initialisation (for aggregate init) with the
implementation here.

Is GCC 4.8 still actually supported by the LLVM project? If yes, do we
know when we're going to drop support for GCC 4.8? This patch and the
ones dependent on it landing also require a compiler that can ensure
that aggregate-init via placement-new with brace initialisation works
as specified.

I can probably reinstate the constructors and using the non-braced
init construction call for placement new, but that's actually going to
needlessly complicate the implementation here.

Does Chromium use XRay yet, and if not can we instead disable XRay
from the standalone builds being done by Chromium with older
compilers?

Thanks in advance.

Cheers
On Wed, Dec 5, 2018 at 9:23 PM Hans Wennborg <hwennborg at google.com> wrote:
>
> I see you landed some follow-ups for the build breakage, but this is
> still breaking Chromium's toolchain build (e.g.
> https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.clang%2FToTLinux%2F4554%2F%2B%2Frecipes%2Fsteps%2Fgclient_runhooks%2F0%2Fstdout)
>
> It seems the new code doesn't compile with GCC 4.8 (I used
> https://commondatastorage.googleapis.com/chromium-browser-clang/tools/gcc485precise.tgz
> but our builders use stock 4.8.4) when building compiler-rt
> stand-alone.
>
> I've reverted in r348346 in the meantime.
>
> To reproduce:
>
> $ CC=/work/chromium/src/third_party/llvm-build-tools/gcc485precise/bin/gcc
> CXX=/work/chromium/src/third_party/llvm-build-tools/gcc485precise/bin/g++
> cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON
> -DLLVM_CONFIG_PATH=/work/llvm/build.release/bin/llvm-config
> ../projects/compiler-rt/
> $ $ ninja lib/xray/CMakeFiles/RTXrayPROFILING.x86_64.dir/xray_profile_collector.cc.o
> [1/1] Building CXX object
> lib/xray/CMakeFiles/RTXrayPROFILING.x86_64.dir/xray_profile_collector.cc.o
> FAILED: lib/xray/CMakeFiles/RTXrayPROFILING.x86_64.dir/xray_profile_collector.cc.o
> /work/chromium/src/third_party/llvm-build-tools/gcc485precise/bin/g++
> -DXRAY_HAS_EXCEPTIONS=1 -I/work/llvm/projects/compiler-rt/lib/xray/..
> -I/work/llvm/projects/compiler-rt/lib/xray/../../include -Wall
> -std=c++11 -Wno-unused-parameter -O3 -DNDEBUG    -m64 -fPIC
> -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables
> -fno-stack-protector -fvisibility=hidden -fno-lto -O3 -g
> -Wno-variadic-macros -Wno-non-virtual-dtor -fno-rtti -MD -MT
> lib/xray/CMakeFiles/RTXrayPROFILING.x86_64.dir/xray_profile_collector.cc.o
> -MF lib/xray/CMakeFiles/RTXrayPROFILING.x86_64.dir/xray_profile_collector.cc.o.d
> -o lib/xray/CMakeFiles/RTXrayPROFILING.x86_64.dir/xray_profile_collector.cc.o
> -c /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc
> In file included from
> /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:20:0,
>                  from
> /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.h:21,
>                  from
> /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:15:
> /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In
> instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with
> Args = {const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&)
> const::NodeAndTarget&}; T =
> __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&)
> const::NodeAndTarget]’:
> /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:
>   required from ‘T* __xray::Array<T>::Append(const T&) [with T =
> __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&)
> const::NodeAndTarget]’
> /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:517:54:
>   required from here
> /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5:
> error: could not convert ‘{std::forward<const
> __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&)
> const::NodeAndTarget&>((* & args#0))}’ from ‘<brace-enclosed
> initializer list>’ to
> ‘__xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&)
> const::NodeAndTarget’
>      new (AlignedOffset) T{std::forward<Args>(args)...};
>      ^
> /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In
> instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with
> Args = {const __xray::profileCollectorService::{anonymous}::ThreadTrie&};
> T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’:
> /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:
>   required from ‘T* __xray::Array<T>::Append(const T&) [with T =
> __xray::profileCollectorService::{anonymous}::ThreadTrie]’
> /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:98:34:
>   required from here
> /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5:
> error: could not convert ‘{std::forward<const
> __xray::profileCollectorService::{anonymous}::ThreadTrie&>((* &
> args#0))}’ from
> ‘<brace-enclosed initializer list>’ to
> ‘__xray::profileCollectorService::{anonymous}::ThreadTrie’
> /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In
> instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with
> Args = {const __xray::profileCollectorService::{anonymous}::ProfileBuffer&};
> T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]’:
> /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:
>   required from ‘T* __xray::Array<T>::Append(const T&) [with T =
> __xray::profileCollectorService::{anonymous}::ProfileBuffer]
>> /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:244:44:
>   required from here
> /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5:
> error: could not convert ‘{std::forward<const
> __xray::profileCollectorService::{anonymous}::ProfileBuffer&>((* &
> args#0))}’ from ‘<brace-enclosed initializer list>’ to
> ‘__xray::profileCollectorService::{anonymous}::ProfileBuffer’
>
>
>
> On Wed, Dec 5, 2018 at 7:47 AM Dean Michael Berris via llvm-commits
> <llvm-commits at lists.llvm.org> wrote:
> >
> > Author: dberris
> > Date: Tue Dec  4 22:44:34 2018
> > New Revision: 348335
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=348335&view=rev
> > Log:
> > [XRay] Move-only Allocator, FunctionCallTrie, and Array
> >
> > Summary:
> > This change makes the allocator and function call trie implementations
> > move-aware and remove the FunctionCallTrie's reliance on a
> > heap-allocated set of allocators.
> >
> > The change makes it possible to always have storage associated with
> > Allocator instances, not necessarily having heap-allocated memory
> > obtainable from these allocator instances. We also use thread-local
> > uninitialised storage.
> >
> > We've also re-worked the segmented array implementation to have more
> > precondition and post-condition checks when built in debug mode. This
> > enables us to better implement some of the operations with surrounding
> > documentation as well. The `trim` algorithm now has more documentation
> > on the implementation, reducing the requirement to handle special
> > conditions, and being more rigorous on the computations involved.
> >
> > In this change we also introduce an initialisation guard, through which
> > we prevent an initialisation operation from racing with a cleanup
> > operation.
> >
> > We also ensure that the ThreadTries array is not destroyed while copies
> > into the elements are still being performed by other threads submitting
> > profiles.
> >
> > Note that this change still has an issue with accessing thread-local
> > storage from signal handlers that are instrumented with XRay. We also
> > learn that with the testing of this patch, that there will be cases
> > where calls to mmap(...) (through internal_mmap(...)) might be called in
> > signal handlers, but are not async-signal-safe. Subsequent patches will
> > address this, by re-using the `BufferQueue` type used in the FDR mode
> > implementation for pre-allocated memory segments per active, tracing
> > thread.
> >
> > We still want to land this change despite the known issues, with fixes
> > forthcoming.
> >
> > Reviewers: mboerger, jfb
> >
> > Subscribers: jfb, llvm-commits
> >
> > Differential Revision: https://reviews.llvm.org/D54989
> >
> > Modified:
> >     compiler-rt/trunk/lib/xray/tests/unit/function_call_trie_test.cc
> >     compiler-rt/trunk/lib/xray/tests/unit/segmented_array_test.cc
> >     compiler-rt/trunk/lib/xray/xray_allocator.h
> >     compiler-rt/trunk/lib/xray/xray_function_call_trie.h
> >     compiler-rt/trunk/lib/xray/xray_profile_collector.cc
> >     compiler-rt/trunk/lib/xray/xray_profiling.cc
> >     compiler-rt/trunk/lib/xray/xray_segmented_array.h
> >
> > Modified: compiler-rt/trunk/lib/xray/tests/unit/function_call_trie_test.cc
> > URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/xray/tests/unit/function_call_trie_test.cc?rev=348335&r1=348334&r2=348335&view=diff
> > ==============================================================================
> > --- compiler-rt/trunk/lib/xray/tests/unit/function_call_trie_test.cc (original)
> > +++ compiler-rt/trunk/lib/xray/tests/unit/function_call_trie_test.cc Tue Dec  4 22:44:34 2018
> > @@ -309,6 +309,36 @@ TEST(FunctionCallTrieTest, MergeInto) {
> >    EXPECT_EQ(F2.Callees.size(), 0u);
> >  }
> >
> > +TEST(FunctionCallTrieTest, PlacementNewOnAlignedStorage) {
> > +  profilingFlags()->setDefaults();
> > +  typename std::aligned_storage<sizeof(FunctionCallTrie::Allocators),
> > +                                alignof(FunctionCallTrie::Allocators)>::type
> > +      AllocatorsStorage;
> > +  new (&AllocatorsStorage)
> > +      FunctionCallTrie::Allocators(FunctionCallTrie::InitAllocators());
> > +  auto *A =
> > +      reinterpret_cast<FunctionCallTrie::Allocators *>(&AllocatorsStorage);
> > +
> > +  typename std::aligned_storage<sizeof(FunctionCallTrie),
> > +                                alignof(FunctionCallTrie)>::type FCTStorage;
> > +  new (&FCTStorage) FunctionCallTrie(*A);
> > +  auto *T = reinterpret_cast<FunctionCallTrie *>(&FCTStorage);
> > +
> > +  // Put some data into it.
> > +  T->enterFunction(1, 0, 0);
> > +  T->exitFunction(1, 1, 0);
> > +
> > +  // Re-initialize the objects in storage.
> > +  T->~FunctionCallTrie();
> > +  A->~Allocators();
> > +  new (A) FunctionCallTrie::Allocators(FunctionCallTrie::InitAllocators());
> > +  new (T) FunctionCallTrie(*A);
> > +
> > +  // Then put some data into it again.
> > +  T->enterFunction(1, 0, 0);
> > +  T->exitFunction(1, 1, 0);
> > +}
> > +
> >  } // namespace
> >
> >  } // namespace __xray
> >
> > Modified: compiler-rt/trunk/lib/xray/tests/unit/segmented_array_test.cc
> > URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/xray/tests/unit/segmented_array_test.cc?rev=348335&r1=348334&r2=348335&view=diff
> > ==============================================================================
> > --- compiler-rt/trunk/lib/xray/tests/unit/segmented_array_test.cc (original)
> > +++ compiler-rt/trunk/lib/xray/tests/unit/segmented_array_test.cc Tue Dec  4 22:44:34 2018
> > @@ -221,5 +221,91 @@ TEST(SegmentedArrayTest, SimulateStackBe
> >    }
> >  }
> >
> > +TEST(SegmentedArrayTest, PlacementNewOnAlignedStorage) {
> > +  using AllocatorType = typename Array<ShadowStackEntry>::AllocatorType;
> > +  typename std::aligned_storage<sizeof(AllocatorType),
> > +                                alignof(AllocatorType)>::type AllocatorStorage;
> > +  new (&AllocatorStorage) AllocatorType(1 << 10);
> > +  auto *A = reinterpret_cast<AllocatorType *>(&AllocatorStorage);
> > +  typename std::aligned_storage<sizeof(Array<ShadowStackEntry>),
> > +                                alignof(Array<ShadowStackEntry>)>::type
> > +      ArrayStorage;
> > +  new (&ArrayStorage) Array<ShadowStackEntry>(*A);
> > +  auto *Data = reinterpret_cast<Array<ShadowStackEntry> *>(&ArrayStorage);
> > +
> > +  static uint64_t Dummy = 0;
> > +  constexpr uint64_t Max = 9;
> > +
> > +  for (uint64_t i = 0; i < Max; ++i) {
> > +    auto P = Data->Append({i, &Dummy});
> > +    ASSERT_NE(P, nullptr);
> > +    ASSERT_EQ(P->NodePtr, &Dummy);
> > +    auto &Back = Data->back();
> > +    ASSERT_EQ(Back.NodePtr, &Dummy);
> > +    ASSERT_EQ(Back.EntryTSC, i);
> > +  }
> > +
> > +  // Simulate a stack by checking the data from the end as we're trimming.
> > +  auto Counter = Max;
> > +  ASSERT_EQ(Data->size(), size_t(Max));
> > +  while (!Data->empty()) {
> > +    const auto &Top = Data->back();
> > +    uint64_t *TopNode = Top.NodePtr;
> > +    EXPECT_EQ(TopNode, &Dummy) << "Counter = " << Counter;
> > +    Data->trim(1);
> > +    --Counter;
> > +    ASSERT_EQ(Data->size(), size_t(Counter));
> > +  }
> > +
> > +  // Once the stack is exhausted, we re-use the storage.
> > +  for (uint64_t i = 0; i < Max; ++i) {
> > +    auto P = Data->Append({i, &Dummy});
> > +    ASSERT_NE(P, nullptr);
> > +    ASSERT_EQ(P->NodePtr, &Dummy);
> > +    auto &Back = Data->back();
> > +    ASSERT_EQ(Back.NodePtr, &Dummy);
> > +    ASSERT_EQ(Back.EntryTSC, i);
> > +  }
> > +
> > +  // We re-initialize the storage, by calling the destructor and
> > +  // placement-new'ing again.
> > +  Data->~Array();
> > +  A->~AllocatorType();
> > +  new (A) AllocatorType(1 << 10);
> > +  new (Data) Array<ShadowStackEntry>(*A);
> > +
> > +  // Then re-do the test.
> > +  for (uint64_t i = 0; i < Max; ++i) {
> > +    auto P = Data->Append({i, &Dummy});
> > +    ASSERT_NE(P, nullptr);
> > +    ASSERT_EQ(P->NodePtr, &Dummy);
> > +    auto &Back = Data->back();
> > +    ASSERT_EQ(Back.NodePtr, &Dummy);
> > +    ASSERT_EQ(Back.EntryTSC, i);
> > +  }
> > +
> > +  // Simulate a stack by checking the data from the end as we're trimming.
> > +  Counter = Max;
> > +  ASSERT_EQ(Data->size(), size_t(Max));
> > +  while (!Data->empty()) {
> > +    const auto &Top = Data->back();
> > +    uint64_t *TopNode = Top.NodePtr;
> > +    EXPECT_EQ(TopNode, &Dummy) << "Counter = " << Counter;
> > +    Data->trim(1);
> > +    --Counter;
> > +    ASSERT_EQ(Data->size(), size_t(Counter));
> > +  }
> > +
> > +  // Once the stack is exhausted, we re-use the storage.
> > +  for (uint64_t i = 0; i < Max; ++i) {
> > +    auto P = Data->Append({i, &Dummy});
> > +    ASSERT_NE(P, nullptr);
> > +    ASSERT_EQ(P->NodePtr, &Dummy);
> > +    auto &Back = Data->back();
> > +    ASSERT_EQ(Back.NodePtr, &Dummy);
> > +    ASSERT_EQ(Back.EntryTSC, i);
> > +  }
> > +}
> > +
> >  } // namespace
> >  } // namespace __xray
> >
> > Modified: compiler-rt/trunk/lib/xray/xray_allocator.h
> > URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/xray/xray_allocator.h?rev=348335&r1=348334&r2=348335&view=diff
> > ==============================================================================
> > --- compiler-rt/trunk/lib/xray/xray_allocator.h (original)
> > +++ compiler-rt/trunk/lib/xray/xray_allocator.h Tue Dec  4 22:44:34 2018
> > @@ -63,7 +63,7 @@ template <class T> T *allocate() XRAY_NE
> >  #else
> >    uptr B = internal_mmap(NULL, RoundedSize, PROT_READ | PROT_WRITE,
> >                           MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > -  int ErrNo;
> > +  int ErrNo = 0;
> >    if (UNLIKELY(internal_iserror(B, &ErrNo))) {
> >      if (Verbosity())
> >        Report(
> > @@ -113,7 +113,7 @@ T *allocateBuffer(size_t S) XRAY_NEVER_I
> >  #else
> >    uptr B = internal_mmap(NULL, RoundedSize, PROT_READ | PROT_WRITE,
> >                           MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > -  int ErrNo;
> > +  int ErrNo = 0;
> >    if (UNLIKELY(internal_iserror(B, &ErrNo))) {
> >      if (Verbosity())
> >        Report(
> > @@ -171,7 +171,7 @@ template <size_t N> struct Allocator {
> >    };
> >
> >  private:
> > -  const size_t MaxMemory{0};
> > +  size_t MaxMemory{0};
> >    unsigned char *BackingStore = nullptr;
> >    unsigned char *AlignedNextBlock = nullptr;
> >    size_t AllocatedBlocks = 0;
> > @@ -223,7 +223,43 @@ private:
> >
> >  public:
> >    explicit Allocator(size_t M) XRAY_NEVER_INSTRUMENT
> > -      : MaxMemory(RoundUpTo(M, kCacheLineSize)) {}
> > +      : MaxMemory(RoundUpTo(M, kCacheLineSize)),
> > +        BackingStore(nullptr),
> > +        AlignedNextBlock(nullptr),
> > +        AllocatedBlocks(0),
> > +        Mutex() {}
> > +
> > +  Allocator(const Allocator &) = delete;
> > +  Allocator &operator=(const Allocator &) = delete;
> > +
> > +  Allocator(Allocator &&O) XRAY_NEVER_INSTRUMENT {
> > +    SpinMutexLock L0(&Mutex);
> > +    SpinMutexLock L1(&O.Mutex);
> > +    MaxMemory = O.MaxMemory;
> > +    O.MaxMemory = 0;
> > +    BackingStore = O.BackingStore;
> > +    O.BackingStore = nullptr;
> > +    AlignedNextBlock = O.AlignedNextBlock;
> > +    O.AlignedNextBlock = nullptr;
> > +    AllocatedBlocks = O.AllocatedBlocks;
> > +    O.AllocatedBlocks = 0;
> > +  }
> > +
> > +  Allocator &operator=(Allocator &&O) XRAY_NEVER_INSTRUMENT {
> > +    SpinMutexLock L0(&Mutex);
> > +    SpinMutexLock L1(&O.Mutex);
> > +    MaxMemory = O.MaxMemory;
> > +    O.MaxMemory = 0;
> > +    if (BackingStore != nullptr)
> > +      deallocate(BackingStore, MaxMemory);
> > +    BackingStore = O.BackingStore;
> > +    O.BackingStore = nullptr;
> > +    AlignedNextBlock = O.AlignedNextBlock;
> > +    O.AlignedNextBlock = nullptr;
> > +    AllocatedBlocks = O.AllocatedBlocks;
> > +    O.AllocatedBlocks = 0;
> > +    return *this;
> > +  }
> >
> >    Block Allocate() XRAY_NEVER_INSTRUMENT { return {Alloc()}; }
> >
> >
> > Modified: compiler-rt/trunk/lib/xray/xray_function_call_trie.h
> > URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/xray/xray_function_call_trie.h?rev=348335&r1=348334&r2=348335&view=diff
> > ==============================================================================
> > --- compiler-rt/trunk/lib/xray/xray_function_call_trie.h (original)
> > +++ compiler-rt/trunk/lib/xray/xray_function_call_trie.h Tue Dec  4 22:44:34 2018
> > @@ -98,9 +98,6 @@ public:
> >    struct NodeIdPair {
> >      Node *NodePtr;
> >      int32_t FId;
> > -
> > -    // Constructor for inplace-construction.
> > -    NodeIdPair(Node *N, int32_t F) : NodePtr(N), FId(F) {}
> >    };
> >
> >    using NodeIdPairArray = Array<NodeIdPair>;
> > @@ -118,15 +115,6 @@ public:
> >      uint64_t CumulativeLocalTime; // Typically in TSC deltas, not wall-time.
> >      int32_t FId;
> >
> > -    // We add a constructor here to allow us to inplace-construct through
> > -    // Array<...>'s AppendEmplace.
> > -    Node(Node *P, NodeIdPairAllocatorType &A, uint64_t CC, uint64_t CLT,
> > -         int32_t F) XRAY_NEVER_INSTRUMENT : Parent(P),
> > -                                            Callees(A),
> > -                                            CallCount(CC),
> > -                                            CumulativeLocalTime(CLT),
> > -                                            FId(F) {}
> > -
> >      // TODO: Include the compact histogram.
> >    };
> >
> > @@ -135,13 +123,6 @@ private:
> >      uint64_t EntryTSC;
> >      Node *NodePtr;
> >      uint16_t EntryCPU;
> > -
> > -    // We add a constructor here to allow us to inplace-construct through
> > -    // Array<...>'s AppendEmplace.
> > -    ShadowStackEntry(uint64_t T, Node *N, uint16_t C) XRAY_NEVER_INSTRUMENT
> > -        : EntryTSC{T},
> > -          NodePtr{N},
> > -          EntryCPU{C} {}
> >    };
> >
> >    using NodeArray = Array<Node>;
> > @@ -156,20 +137,71 @@ public:
> >      using RootAllocatorType = RootArray::AllocatorType;
> >      using ShadowStackAllocatorType = ShadowStackArray::AllocatorType;
> >
> > +    // Use hosted aligned storage members to allow for trivial move and init.
> > +    // This also allows us to sidestep the potential-failing allocation issue.
> > +    typename std::aligned_storage<sizeof(NodeAllocatorType),
> > +                                  alignof(NodeAllocatorType)>::type
> > +        NodeAllocatorStorage;
> > +    typename std::aligned_storage<sizeof(RootAllocatorType),
> > +                                  alignof(RootAllocatorType)>::type
> > +        RootAllocatorStorage;
> > +    typename std::aligned_storage<sizeof(ShadowStackAllocatorType),
> > +                                  alignof(ShadowStackAllocatorType)>::type
> > +        ShadowStackAllocatorStorage;
> > +    typename std::aligned_storage<sizeof(NodeIdPairAllocatorType),
> > +                                  alignof(NodeIdPairAllocatorType)>::type
> > +        NodeIdPairAllocatorStorage;
> > +
> >      NodeAllocatorType *NodeAllocator = nullptr;
> >      RootAllocatorType *RootAllocator = nullptr;
> >      ShadowStackAllocatorType *ShadowStackAllocator = nullptr;
> >      NodeIdPairAllocatorType *NodeIdPairAllocator = nullptr;
> >
> > -    Allocators() {}
> > +    Allocators() = default;
> >      Allocators(const Allocators &) = delete;
> >      Allocators &operator=(const Allocators &) = delete;
> >
> > -    Allocators(Allocators &&O) XRAY_NEVER_INSTRUMENT
> > -        : NodeAllocator(O.NodeAllocator),
> > -          RootAllocator(O.RootAllocator),
> > -          ShadowStackAllocator(O.ShadowStackAllocator),
> > -          NodeIdPairAllocator(O.NodeIdPairAllocator) {
> > +    explicit Allocators(uptr Max) XRAY_NEVER_INSTRUMENT {
> > +      new (&NodeAllocatorStorage) NodeAllocatorType(Max);
> > +      NodeAllocator =
> > +          reinterpret_cast<NodeAllocatorType *>(&NodeAllocatorStorage);
> > +
> > +      new (&RootAllocatorStorage) RootAllocatorType(Max);
> > +      RootAllocator =
> > +          reinterpret_cast<RootAllocatorType *>(&RootAllocatorStorage);
> > +
> > +      new (&ShadowStackAllocatorStorage) ShadowStackAllocatorType(Max);
> > +      ShadowStackAllocator = reinterpret_cast<ShadowStackAllocatorType *>(
> > +          &ShadowStackAllocatorStorage);
> > +
> > +      new (&NodeIdPairAllocatorStorage) NodeIdPairAllocatorType(Max);
> > +      NodeIdPairAllocator = reinterpret_cast<NodeIdPairAllocatorType *>(
> > +          &NodeIdPairAllocatorStorage);
> > +    }
> > +
> > +    Allocators(Allocators &&O) XRAY_NEVER_INSTRUMENT {
> > +      // Here we rely on the safety of memcpy'ing contents of the storage
> > +      // members, and then pointing the source pointers to nullptr.
> > +      internal_memcpy(&NodeAllocatorStorage, &O.NodeAllocatorStorage,
> > +                      sizeof(NodeAllocatorType));
> > +      internal_memcpy(&RootAllocatorStorage, &O.RootAllocatorStorage,
> > +                      sizeof(RootAllocatorType));
> > +      internal_memcpy(&ShadowStackAllocatorStorage,
> > +                      &O.ShadowStackAllocatorStorage,
> > +                      sizeof(ShadowStackAllocatorType));
> > +      internal_memcpy(&NodeIdPairAllocatorStorage,
> > +                      &O.NodeIdPairAllocatorStorage,
> > +                      sizeof(NodeIdPairAllocatorType));
> > +
> > +      NodeAllocator =
> > +          reinterpret_cast<NodeAllocatorType *>(&NodeAllocatorStorage);
> > +      RootAllocator =
> > +          reinterpret_cast<RootAllocatorType *>(&RootAllocatorStorage);
> > +      ShadowStackAllocator = reinterpret_cast<ShadowStackAllocatorType *>(
> > +          &ShadowStackAllocatorStorage);
> > +      NodeIdPairAllocator = reinterpret_cast<NodeIdPairAllocatorType *>(
> > +          &NodeIdPairAllocatorStorage);
> > +
> >        O.NodeAllocator = nullptr;
> >        O.RootAllocator = nullptr;
> >        O.ShadowStackAllocator = nullptr;
> > @@ -177,79 +209,77 @@ public:
> >      }
> >
> >      Allocators &operator=(Allocators &&O) XRAY_NEVER_INSTRUMENT {
> > -      {
> > -        auto Tmp = O.NodeAllocator;
> > -        O.NodeAllocator = this->NodeAllocator;
> > -        this->NodeAllocator = Tmp;
> > -      }
> > -      {
> > -        auto Tmp = O.RootAllocator;
> > -        O.RootAllocator = this->RootAllocator;
> > -        this->RootAllocator = Tmp;
> > -      }
> > -      {
> > -        auto Tmp = O.ShadowStackAllocator;
> > -        O.ShadowStackAllocator = this->ShadowStackAllocator;
> > -        this->ShadowStackAllocator = Tmp;
> > -      }
> > -      {
> > -        auto Tmp = O.NodeIdPairAllocator;
> > -        O.NodeIdPairAllocator = this->NodeIdPairAllocator;
> > -        this->NodeIdPairAllocator = Tmp;
> > -      }
> > -      return *this;
> > -    }
> > -
> > -    ~Allocators() XRAY_NEVER_INSTRUMENT {
> > -      // Note that we cannot use delete on these pointers, as they need to be
> > -      // returned to the sanitizer_common library's internal memory tracking
> > -      // system.
> > -      if (NodeAllocator != nullptr) {
> > +      // When moving into an existing instance, we ensure that we clean up the
> > +      // current allocators.
> > +      if (NodeAllocator)
> >          NodeAllocator->~NodeAllocatorType();
> > -        deallocate(NodeAllocator);
> > +      if (O.NodeAllocator) {
> > +        new (&NodeAllocatorStorage)
> > +            NodeAllocatorType(std::move(*O.NodeAllocator));
> > +        NodeAllocator =
> > +            reinterpret_cast<NodeAllocatorType *>(&NodeAllocatorStorage);
> > +        O.NodeAllocator = nullptr;
> > +      } else {
> >          NodeAllocator = nullptr;
> >        }
> > -      if (RootAllocator != nullptr) {
> > +
> > +      if (RootAllocator)
> >          RootAllocator->~RootAllocatorType();
> > -        deallocate(RootAllocator);
> > +      if (O.RootAllocator) {
> > +        new (&RootAllocatorStorage)
> > +            RootAllocatorType(std::move(*O.RootAllocator));
> > +        RootAllocator =
> > +            reinterpret_cast<RootAllocatorType *>(&RootAllocatorStorage);
> > +        O.RootAllocator = nullptr;
> > +      } else {
> >          RootAllocator = nullptr;
> >        }
> > -      if (ShadowStackAllocator != nullptr) {
> > +
> > +      if (ShadowStackAllocator)
> >          ShadowStackAllocator->~ShadowStackAllocatorType();
> > -        deallocate(ShadowStackAllocator);
> > +      if (O.ShadowStackAllocator) {
> > +        new (&ShadowStackAllocatorStorage)
> > +            ShadowStackAllocatorType(std::move(*O.ShadowStackAllocator));
> > +        ShadowStackAllocator = reinterpret_cast<ShadowStackAllocatorType *>(
> > +            &ShadowStackAllocatorStorage);
> > +        O.ShadowStackAllocator = nullptr;
> > +      } else {
> >          ShadowStackAllocator = nullptr;
> >        }
> > -      if (NodeIdPairAllocator != nullptr) {
> > +
> > +      if (NodeIdPairAllocator)
> >          NodeIdPairAllocator->~NodeIdPairAllocatorType();
> > -        deallocate(NodeIdPairAllocator);
> > +      if (O.NodeIdPairAllocator) {
> > +        new (&NodeIdPairAllocatorStorage)
> > +            NodeIdPairAllocatorType(std::move(*O.NodeIdPairAllocator));
> > +        NodeIdPairAllocator = reinterpret_cast<NodeIdPairAllocatorType *>(
> > +            &NodeIdPairAllocatorStorage);
> > +        O.NodeIdPairAllocator = nullptr;
> > +      } else {
> >          NodeIdPairAllocator = nullptr;
> >        }
> > +
> > +      return *this;
> > +    }
> > +
> > +    ~Allocators() XRAY_NEVER_INSTRUMENT {
> > +      if (NodeAllocator != nullptr)
> > +        NodeAllocator->~NodeAllocatorType();
> > +      if (RootAllocator != nullptr)
> > +        RootAllocator->~RootAllocatorType();
> > +      if (ShadowStackAllocator != nullptr)
> > +        ShadowStackAllocator->~ShadowStackAllocatorType();
> > +      if (NodeIdPairAllocator != nullptr)
> > +        NodeIdPairAllocator->~NodeIdPairAllocatorType();
> >      }
> >    };
> >
> > -  // TODO: Support configuration of options through the arguments.
> >    static Allocators InitAllocators() XRAY_NEVER_INSTRUMENT {
> >      return InitAllocatorsCustom(profilingFlags()->per_thread_allocator_max);
> >    }
> >
> >    static Allocators InitAllocatorsCustom(uptr Max) XRAY_NEVER_INSTRUMENT {
> > -    Allocators A;
> > -    auto NodeAllocator = allocate<Allocators::NodeAllocatorType>();
> > -    new (NodeAllocator) Allocators::NodeAllocatorType(Max);
> > -    A.NodeAllocator = NodeAllocator;
> > -
> > -    auto RootAllocator = allocate<Allocators::RootAllocatorType>();
> > -    new (RootAllocator) Allocators::RootAllocatorType(Max);
> > -    A.RootAllocator = RootAllocator;
> > -
> > -    auto ShadowStackAllocator =
> > -        allocate<Allocators::ShadowStackAllocatorType>();
> > -    new (ShadowStackAllocator) Allocators::ShadowStackAllocatorType(Max);
> > -    A.ShadowStackAllocator = ShadowStackAllocator;
> > -
> > -    auto NodeIdPairAllocator = allocate<NodeIdPairAllocatorType>();
> > -    new (NodeIdPairAllocator) NodeIdPairAllocatorType(Max);
> > -    A.NodeIdPairAllocator = NodeIdPairAllocator;
> > +    Allocators A(Max);
> >      return A;
> >    }
> >
> > @@ -257,14 +287,38 @@ private:
> >    NodeArray Nodes;
> >    RootArray Roots;
> >    ShadowStackArray ShadowStack;
> > -  NodeIdPairAllocatorType *NodeIdPairAllocator = nullptr;
> > +  NodeIdPairAllocatorType *NodeIdPairAllocator;
> > +  uint32_t OverflowedFunctions;
> >
> >  public:
> >    explicit FunctionCallTrie(const Allocators &A) XRAY_NEVER_INSTRUMENT
> >        : Nodes(*A.NodeAllocator),
> >          Roots(*A.RootAllocator),
> >          ShadowStack(*A.ShadowStackAllocator),
> > -        NodeIdPairAllocator(A.NodeIdPairAllocator) {}
> > +        NodeIdPairAllocator(A.NodeIdPairAllocator),
> > +        OverflowedFunctions(0) {}
> > +
> > +  FunctionCallTrie() = delete;
> > +  FunctionCallTrie(const FunctionCallTrie &) = delete;
> > +  FunctionCallTrie &operator=(const FunctionCallTrie &) = delete;
> > +
> > +  FunctionCallTrie(FunctionCallTrie &&O) XRAY_NEVER_INSTRUMENT
> > +      : Nodes(std::move(O.Nodes)),
> > +        Roots(std::move(O.Roots)),
> > +        ShadowStack(std::move(O.ShadowStack)),
> > +        NodeIdPairAllocator(O.NodeIdPairAllocator),
> > +        OverflowedFunctions(O.OverflowedFunctions) {}
> > +
> > +  FunctionCallTrie &operator=(FunctionCallTrie &&O) XRAY_NEVER_INSTRUMENT {
> > +    Nodes = std::move(O.Nodes);
> > +    Roots = std::move(O.Roots);
> > +    ShadowStack = std::move(O.ShadowStack);
> > +    NodeIdPairAllocator = O.NodeIdPairAllocator;
> > +    OverflowedFunctions = O.OverflowedFunctions;
> > +    return *this;
> > +  }
> > +
> > +  ~FunctionCallTrie() XRAY_NEVER_INSTRUMENT {}
> >
> >    void enterFunction(const int32_t FId, uint64_t TSC,
> >                       uint16_t CPU) XRAY_NEVER_INSTRUMENT {
> > @@ -272,12 +326,17 @@ public:
> >      // This function primarily deals with ensuring that the ShadowStack is
> >      // consistent and ready for when an exit event is encountered.
> >      if (UNLIKELY(ShadowStack.empty())) {
> > -      auto NewRoot =
> > -          Nodes.AppendEmplace(nullptr, *NodeIdPairAllocator, 0u, 0u, FId);
> > +      auto NewRoot = Nodes.AppendEmplace(
> > +          nullptr, NodeIdPairArray{*NodeIdPairAllocator}, 0u, 0u, FId);
> >        if (UNLIKELY(NewRoot == nullptr))
> >          return;
> > -      Roots.Append(NewRoot);
> > -      ShadowStack.AppendEmplace(TSC, NewRoot, CPU);
> > +      if (Roots.Append(NewRoot) == nullptr)
> > +        return;
> > +      if (ShadowStack.AppendEmplace(TSC, NewRoot, CPU) == nullptr) {
> > +        Roots.trim(1);
> > +        ++OverflowedFunctions;
> > +        return;
> > +      }
> >        return;
> >      }
> >
> > @@ -291,29 +350,39 @@ public:
> >          [FId](const NodeIdPair &NR) { return NR.FId == FId; });
> >      if (Callee != nullptr) {
> >        CHECK_NE(Callee->NodePtr, nullptr);
> > -      ShadowStack.AppendEmplace(TSC, Callee->NodePtr, CPU);
> > +      if (ShadowStack.AppendEmplace(TSC, Callee->NodePtr, CPU) == nullptr)
> > +        ++OverflowedFunctions;
> >        return;
> >      }
> >
> >      // This means we've never seen this stack before, create a new node here.
> > -    auto NewNode =
> > -        Nodes.AppendEmplace(TopNode, *NodeIdPairAllocator, 0u, 0u, FId);
> > +    auto NewNode = Nodes.AppendEmplace(
> > +        TopNode, NodeIdPairArray(*NodeIdPairAllocator), 0u, 0u, FId);
> >      if (UNLIKELY(NewNode == nullptr))
> >        return;
> >      DCHECK_NE(NewNode, nullptr);
> >      TopNode->Callees.AppendEmplace(NewNode, FId);
> > -    ShadowStack.AppendEmplace(TSC, NewNode, CPU);
> > +    if (ShadowStack.AppendEmplace(TSC, NewNode, CPU) == nullptr)
> > +      ++OverflowedFunctions;
> >      DCHECK_NE(ShadowStack.back().NodePtr, nullptr);
> >      return;
> >    }
> >
> >    void exitFunction(int32_t FId, uint64_t TSC,
> >                      uint16_t CPU) XRAY_NEVER_INSTRUMENT {
> > +    // If we're exiting functions that have "overflowed" or don't fit into the
> > +    // stack due to allocator constraints, we then decrement that count first.
> > +    if (OverflowedFunctions) {
> > +      --OverflowedFunctions;
> > +      return;
> > +    }
> > +
> >      // When we exit a function, we look up the ShadowStack to see whether we've
> >      // entered this function before. We do as little processing here as we can,
> >      // since most of the hard work would have already been done at function
> >      // entry.
> >      uint64_t CumulativeTreeTime = 0;
> > +
> >      while (!ShadowStack.empty()) {
> >        const auto &Top = ShadowStack.back();
> >        auto TopNode = Top.NodePtr;
> > @@ -380,7 +449,7 @@ public:
> >      for (const auto Root : getRoots()) {
> >        // Add a node in O for this root.
> >        auto NewRoot = O.Nodes.AppendEmplace(
> > -          nullptr, *O.NodeIdPairAllocator, Root->CallCount,
> > +          nullptr, NodeIdPairArray(*O.NodeIdPairAllocator), Root->CallCount,
> >            Root->CumulativeLocalTime, Root->FId);
> >
> >        // Because we cannot allocate more memory we should bail out right away.
> > @@ -399,8 +468,9 @@ public:
> >          DFSStack.trim(1);
> >          for (const auto Callee : NP.Node->Callees) {
> >            auto NewNode = O.Nodes.AppendEmplace(
> > -              NP.NewNode, *O.NodeIdPairAllocator, Callee.NodePtr->CallCount,
> > -              Callee.NodePtr->CumulativeLocalTime, Callee.FId);
> > +              NP.NewNode, NodeIdPairArray(*O.NodeIdPairAllocator),
> > +              Callee.NodePtr->CallCount, Callee.NodePtr->CumulativeLocalTime,
> > +              Callee.FId);
> >            if (UNLIKELY(NewNode == nullptr))
> >              return;
> >            NP.NewNode->Callees.AppendEmplace(NewNode, Callee.FId);
> > @@ -433,8 +503,9 @@ public:
> >        auto R = O.Roots.find_element(
> >            [&](const Node *Node) { return Node->FId == Root->FId; });
> >        if (R == nullptr) {
> > -        TargetRoot = O.Nodes.AppendEmplace(nullptr, *O.NodeIdPairAllocator, 0u,
> > -                                           0u, Root->FId);
> > +        TargetRoot = O.Nodes.AppendEmplace(
> > +            nullptr, NodeIdPairArray(*O.NodeIdPairAllocator), 0u, 0u,
> > +            Root->FId);
> >          if (UNLIKELY(TargetRoot == nullptr))
> >            return;
> >
> > @@ -459,7 +530,8 @@ public:
> >                });
> >            if (TargetCallee == nullptr) {
> >              auto NewTargetNode = O.Nodes.AppendEmplace(
> > -                NT.TargetNode, *O.NodeIdPairAllocator, 0u, 0u, Callee.FId);
> > +                NT.TargetNode, NodeIdPairArray(*O.NodeIdPairAllocator), 0u, 0u,
> > +                Callee.FId);
> >
> >              if (UNLIKELY(NewTargetNode == nullptr))
> >                return;
> >
> > Modified: compiler-rt/trunk/lib/xray/xray_profile_collector.cc
> > URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/xray/xray_profile_collector.cc?rev=348335&r1=348334&r2=348335&view=diff
> > ==============================================================================
> > --- compiler-rt/trunk/lib/xray/xray_profile_collector.cc (original)
> > +++ compiler-rt/trunk/lib/xray/xray_profile_collector.cc Tue Dec  4 22:44:34 2018
> > @@ -86,7 +86,8 @@ static FunctionCallTrie::Allocators *Glo
> >
> >  void post(const FunctionCallTrie &T, tid_t TId) XRAY_NEVER_INSTRUMENT {
> >    static pthread_once_t Once = PTHREAD_ONCE_INIT;
> > -  pthread_once(&Once, +[] { reset(); });
> > +  pthread_once(
> > +      &Once, +[]() XRAY_NEVER_INSTRUMENT { reset(); });
> >
> >    ThreadTrie *Item = nullptr;
> >    {
> > @@ -95,13 +96,14 @@ void post(const FunctionCallTrie &T, tid
> >        return;
> >
> >      Item = ThreadTries->Append({});
> > +    if (Item == nullptr)
> > +      return;
> > +
> >      Item->TId = TId;
> >      auto Trie = reinterpret_cast<FunctionCallTrie *>(&Item->TrieStorage);
> >      new (Trie) FunctionCallTrie(*GlobalAllocators);
> > +    T.deepCopyInto(*Trie);
> >    }
> > -
> > -  auto Trie = reinterpret_cast<FunctionCallTrie *>(&Item->TrieStorage);
> > -  T.deepCopyInto(*Trie);
> >  }
> >
> >  // A PathArray represents the function id's representing a stack trace. In this
> > @@ -115,13 +117,7 @@ struct ProfileRecord {
> >    // The Path in this record is the function id's from the leaf to the root of
> >    // the function call stack as represented from a FunctionCallTrie.
> >    PathArray Path;
> > -  const FunctionCallTrie::Node *Node = nullptr;
> > -
> > -  // Constructor for in-place construction.
> > -  ProfileRecord(PathAllocator &A,
> > -                const FunctionCallTrie::Node *N) XRAY_NEVER_INSTRUMENT
> > -      : Path(A),
> > -        Node(N) {}
> > +  const FunctionCallTrie::Node *Node;
> >  };
> >
> >  namespace {
> > @@ -142,7 +138,7 @@ populateRecords(ProfileRecordArray &PRs,
> >      while (!DFSStack.empty()) {
> >        auto Node = DFSStack.back();
> >        DFSStack.trim(1);
> > -      auto Record = PRs.AppendEmplace(PA, Node);
> > +      auto Record = PRs.AppendEmplace(PathArray{PA}, Node);
> >        if (Record == nullptr)
> >          return;
> >        DCHECK_NE(Record, nullptr);
> > @@ -203,7 +199,7 @@ void serialize() XRAY_NEVER_INSTRUMENT {
> >
> >    // Clear out the global ProfileBuffers, if it's not empty.
> >    for (auto &B : *ProfileBuffers)
> > -    deallocateBuffer(reinterpret_cast<uint8_t *>(B.Data), B.Size);
> > +    deallocateBuffer(reinterpret_cast<unsigned char *>(B.Data), B.Size);
> >    ProfileBuffers->trim(ProfileBuffers->size());
> >
> >    if (ThreadTries->empty())
> > @@ -278,8 +274,8 @@ void reset() XRAY_NEVER_INSTRUMENT {
> >
> >    GlobalAllocators =
> >        reinterpret_cast<FunctionCallTrie::Allocators *>(&AllocatorStorage);
> > -  new (GlobalAllocators) FunctionCallTrie::Allocators();
> > -  *GlobalAllocators = FunctionCallTrie::InitAllocators();
> > +  new (GlobalAllocators)
> > +      FunctionCallTrie::Allocators(FunctionCallTrie::InitAllocators());
> >
> >    if (ThreadTriesAllocator != nullptr)
> >      ThreadTriesAllocator->~ThreadTriesArrayAllocator();
> > @@ -312,8 +308,10 @@ XRayBuffer nextBuffer(XRayBuffer B) XRAY
> >    static pthread_once_t Once = PTHREAD_ONCE_INIT;
> >    static typename std::aligned_storage<sizeof(XRayProfilingFileHeader)>::type
> >        FileHeaderStorage;
> > -  pthread_once(&Once,
> > -               +[] { new (&FileHeaderStorage) XRayProfilingFileHeader{}; });
> > +  pthread_once(
> > +      &Once, +[]() XRAY_NEVER_INSTRUMENT {
> > +        new (&FileHeaderStorage) XRayProfilingFileHeader{};
> > +      });
> >
> >    if (UNLIKELY(B.Data == nullptr)) {
> >      // The first buffer should always contain the file header information.
> >
> > Modified: compiler-rt/trunk/lib/xray/xray_profiling.cc
> > URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/xray/xray_profiling.cc?rev=348335&r1=348334&r2=348335&view=diff
> > ==============================================================================
> > --- compiler-rt/trunk/lib/xray/xray_profiling.cc (original)
> > +++ compiler-rt/trunk/lib/xray/xray_profiling.cc Tue Dec  4 22:44:34 2018
> > @@ -31,67 +31,112 @@ namespace __xray {
> >
> >  namespace {
> >
> > -atomic_sint32_t ProfilerLogFlushStatus = {
> > +static atomic_sint32_t ProfilerLogFlushStatus = {
> >      XRayLogFlushStatus::XRAY_LOG_NOT_FLUSHING};
> >
> > -atomic_sint32_t ProfilerLogStatus = {XRayLogInitStatus::XRAY_LOG_UNINITIALIZED};
> > +static atomic_sint32_t ProfilerLogStatus = {
> > +    XRayLogInitStatus::XRAY_LOG_UNINITIALIZED};
> >
> > -SpinMutex ProfilerOptionsMutex;
> > +static SpinMutex ProfilerOptionsMutex;
> >
> > -struct alignas(64) ProfilingData {
> > -  FunctionCallTrie::Allocators *Allocators;
> > -  FunctionCallTrie *FCT;
> > +struct ProfilingData {
> > +  atomic_uintptr_t Allocators;
> > +  atomic_uintptr_t FCT;
> >  };
> >
> >  static pthread_key_t ProfilingKey;
> >
> > -thread_local std::aligned_storage<sizeof(FunctionCallTrie::Allocators)>::type
> > +thread_local std::aligned_storage<sizeof(FunctionCallTrie::Allocators),
> > +                                  alignof(FunctionCallTrie::Allocators)>::type
> >      AllocatorsStorage;
> > -thread_local std::aligned_storage<sizeof(FunctionCallTrie)>::type
> > +thread_local std::aligned_storage<sizeof(FunctionCallTrie),
> > +                                  alignof(FunctionCallTrie)>::type
> >      FunctionCallTrieStorage;
> > -thread_local std::aligned_storage<sizeof(ProfilingData)>::type ThreadStorage{};
> > +thread_local ProfilingData TLD{{0}, {0}};
> > +thread_local atomic_uint8_t ReentranceGuard{0};
> >
> > -static ProfilingData &getThreadLocalData() XRAY_NEVER_INSTRUMENT {
> > -  thread_local auto ThreadOnce = [] {
> > -    new (&ThreadStorage) ProfilingData{};
> > -    auto *Allocators =
> > -        reinterpret_cast<FunctionCallTrie::Allocators *>(&AllocatorsStorage);
> > -    new (Allocators) FunctionCallTrie::Allocators();
> > -    *Allocators = FunctionCallTrie::InitAllocators();
> > -    auto *FCT = reinterpret_cast<FunctionCallTrie *>(&FunctionCallTrieStorage);
> > -    new (FCT) FunctionCallTrie(*Allocators);
> > -    auto &TLD = *reinterpret_cast<ProfilingData *>(&ThreadStorage);
> > -    TLD.Allocators = Allocators;
> > -    TLD.FCT = FCT;
> > -    pthread_setspecific(ProfilingKey, &ThreadStorage);
> > +// We use a separate guard for ensuring that for this thread, if we're already
> > +// cleaning up, that any signal handlers don't attempt to cleanup nor
> > +// initialise.
> > +thread_local atomic_uint8_t TLDInitGuard{0};
> > +
> > +// We also use a separate latch to signal that the thread is exiting, and
> > +// non-essential work should be ignored (things like recording events, etc.).
> > +thread_local atomic_uint8_t ThreadExitingLatch{0};
> > +
> > +static ProfilingData *getThreadLocalData() XRAY_NEVER_INSTRUMENT {
> > +  thread_local auto ThreadOnce = []() XRAY_NEVER_INSTRUMENT {
> > +    pthread_setspecific(ProfilingKey, &TLD);
> >      return false;
> >    }();
> >    (void)ThreadOnce;
> >
> > -  auto &TLD = *reinterpret_cast<ProfilingData *>(&ThreadStorage);
> > -
> > -  if (UNLIKELY(TLD.Allocators == nullptr || TLD.FCT == nullptr)) {
> > -    auto *Allocators =
> > -        reinterpret_cast<FunctionCallTrie::Allocators *>(&AllocatorsStorage);
> > -    new (Allocators) FunctionCallTrie::Allocators();
> > -    *Allocators = FunctionCallTrie::InitAllocators();
> > -    auto *FCT = reinterpret_cast<FunctionCallTrie *>(&FunctionCallTrieStorage);
> > -    new (FCT) FunctionCallTrie(*Allocators);
> > -    TLD.Allocators = Allocators;
> > -    TLD.FCT = FCT;
> > +  RecursionGuard TLDInit(TLDInitGuard);
> > +  if (!TLDInit)
> > +    return nullptr;
> > +
> > +  if (atomic_load_relaxed(&ThreadExitingLatch))
> > +    return nullptr;
> > +
> > +  uintptr_t Allocators = 0;
> > +  if (atomic_compare_exchange_strong(&TLD.Allocators, &Allocators, 1,
> > +                                     memory_order_acq_rel)) {
> > +    new (&AllocatorsStorage)
> > +        FunctionCallTrie::Allocators(FunctionCallTrie::InitAllocators());
> > +    Allocators = reinterpret_cast<uintptr_t>(
> > +        reinterpret_cast<FunctionCallTrie::Allocators *>(&AllocatorsStorage));
> > +    atomic_store(&TLD.Allocators, Allocators, memory_order_release);
> > +  }
> > +
> > +  uintptr_t FCT = 0;
> > +  if (atomic_compare_exchange_strong(&TLD.FCT, &FCT, 1, memory_order_acq_rel)) {
> > +    new (&FunctionCallTrieStorage) FunctionCallTrie(
> > +        *reinterpret_cast<FunctionCallTrie::Allocators *>(Allocators));
> > +    FCT = reinterpret_cast<uintptr_t>(
> > +        reinterpret_cast<FunctionCallTrie *>(&FunctionCallTrieStorage));
> > +    atomic_store(&TLD.FCT, FCT, memory_order_release);
> >    }
> >
> > -  return *reinterpret_cast<ProfilingData *>(&ThreadStorage);
> > +  if (FCT == 1)
> > +    return nullptr;
> > +
> > +  return &TLD;
> >  }
> >
> >  static void cleanupTLD() XRAY_NEVER_INSTRUMENT {
> > -  auto &TLD = *reinterpret_cast<ProfilingData *>(&ThreadStorage);
> > -  if (TLD.Allocators != nullptr && TLD.FCT != nullptr) {
> > -    TLD.FCT->~FunctionCallTrie();
> > -    TLD.Allocators->~Allocators();
> > -    TLD.FCT = nullptr;
> > -    TLD.Allocators = nullptr;
> > -  }
> > +  RecursionGuard TLDInit(TLDInitGuard);
> > +  if (!TLDInit)
> > +    return;
> > +
> > +  auto FCT = atomic_exchange(&TLD.FCT, 0, memory_order_acq_rel);
> > +  if (FCT == reinterpret_cast<uintptr_t>(reinterpret_cast<FunctionCallTrie *>(
> > +                 &FunctionCallTrieStorage)))
> > +    reinterpret_cast<FunctionCallTrie *>(FCT)->~FunctionCallTrie();
> > +
> > +  auto Allocators = atomic_exchange(&TLD.Allocators, 0, memory_order_acq_rel);
> > +  if (Allocators ==
> > +      reinterpret_cast<uintptr_t>(
> > +          reinterpret_cast<FunctionCallTrie::Allocators *>(&AllocatorsStorage)))
> > +    reinterpret_cast<FunctionCallTrie::Allocators *>(Allocators)->~Allocators();
> > +}
> > +
> > +static void postCurrentThreadFCT(ProfilingData &T) XRAY_NEVER_INSTRUMENT {
> > +  RecursionGuard TLDInit(TLDInitGuard);
> > +  if (!TLDInit)
> > +    return;
> > +
> > +  uintptr_t P = atomic_load(&T.FCT, memory_order_acquire);
> > +  if (P != reinterpret_cast<uintptr_t>(
> > +               reinterpret_cast<FunctionCallTrie *>(&FunctionCallTrieStorage)))
> > +    return;
> > +
> > +  auto FCT = reinterpret_cast<FunctionCallTrie *>(P);
> > +  DCHECK_NE(FCT, nullptr);
> > +
> > +  if (!FCT->getRoots().empty())
> > +    profileCollectorService::post(*FCT, GetTid());
> > +
> > +  cleanupTLD();
> >  }
> >
> >  } // namespace
> > @@ -104,9 +149,6 @@ const char *profilingCompilerDefinedFlag
> >  #endif
> >  }
> >
> > -atomic_sint32_t ProfileFlushStatus = {
> > -    XRayLogFlushStatus::XRAY_LOG_NOT_FLUSHING};
> > -
> >  XRayLogFlushStatus profilingFlush() XRAY_NEVER_INSTRUMENT {
> >    if (atomic_load(&ProfilerLogStatus, memory_order_acquire) !=
> >        XRayLogInitStatus::XRAY_LOG_FINALIZED) {
> > @@ -115,14 +157,27 @@ XRayLogFlushStatus profilingFlush() XRAY
> >      return XRayLogFlushStatus::XRAY_LOG_NOT_FLUSHING;
> >    }
> >
> > -  s32 Result = XRayLogFlushStatus::XRAY_LOG_NOT_FLUSHING;
> > -  if (!atomic_compare_exchange_strong(&ProfilerLogFlushStatus, &Result,
> > -                                      XRayLogFlushStatus::XRAY_LOG_FLUSHING,
> > -                                      memory_order_acq_rel)) {
> > +  RecursionGuard SignalGuard(ReentranceGuard);
> > +  if (!SignalGuard) {
> >      if (Verbosity())
> > -      Report("Not flushing profiles, implementation still finalizing.\n");
> > +      Report("Cannot finalize properly inside a signal handler!\n");
> > +    atomic_store(&ProfilerLogFlushStatus,
> > +                 XRayLogFlushStatus::XRAY_LOG_NOT_FLUSHING,
> > +                 memory_order_release);
> > +    return XRayLogFlushStatus::XRAY_LOG_NOT_FLUSHING;
> >    }
> >
> > +  s32 Previous = atomic_exchange(&ProfilerLogFlushStatus,
> > +                                 XRayLogFlushStatus::XRAY_LOG_FLUSHING,
> > +                                 memory_order_acq_rel);
> > +  if (Previous == XRayLogFlushStatus::XRAY_LOG_FLUSHING) {
> > +    if (Verbosity())
> > +      Report("Not flushing profiles, implementation still flushing.\n");
> > +    return XRayLogFlushStatus::XRAY_LOG_FLUSHING;
> > +  }
> > +
> > +  postCurrentThreadFCT(TLD);
> > +
> >    // At this point, we'll create the file that will contain the profile, but
> >    // only if the options say so.
> >    if (!profilingFlags()->no_flush) {
> > @@ -150,33 +205,19 @@ XRayLogFlushStatus profilingFlush() XRAY
> >      }
> >    }
> >
> > -  profileCollectorService::reset();
> > -
> > -  // Flush the current thread's local data structures as well.
> > +  // Clean up the current thread's TLD information as well.
> >    cleanupTLD();
> >
> > +  profileCollectorService::reset();
> > +
> > +  atomic_store(&ProfilerLogFlushStatus, XRayLogFlushStatus::XRAY_LOG_FLUSHED,
> > +               memory_order_release);
> >    atomic_store(&ProfilerLogStatus, XRayLogFlushStatus::XRAY_LOG_FLUSHED,
> >                 memory_order_release);
> >
> >    return XRayLogFlushStatus::XRAY_LOG_FLUSHED;
> >  }
> >
> > -namespace {
> > -
> > -thread_local atomic_uint8_t ReentranceGuard{0};
> > -
> > -static void postCurrentThreadFCT(ProfilingData &TLD) XRAY_NEVER_INSTRUMENT {
> > -  if (TLD.Allocators == nullptr || TLD.FCT == nullptr)
> > -    return;
> > -
> > -  if (!TLD.FCT->getRoots().empty())
> > -    profileCollectorService::post(*TLD.FCT, GetTid());
> > -
> > -  cleanupTLD();
> > -}
> > -
> > -} // namespace
> > -
> >  void profilingHandleArg0(int32_t FuncId,
> >                           XRayEntryType Entry) XRAY_NEVER_INSTRUMENT {
> >    unsigned char CPU;
> > @@ -186,22 +227,29 @@ void profilingHandleArg0(int32_t FuncId,
> >      return;
> >
> >    auto Status = atomic_load(&ProfilerLogStatus, memory_order_acquire);
> > +  if (UNLIKELY(Status == XRayLogInitStatus::XRAY_LOG_UNINITIALIZED ||
> > +               Status == XRayLogInitStatus::XRAY_LOG_INITIALIZING))
> > +    return;
> > +
> >    if (UNLIKELY(Status == XRayLogInitStatus::XRAY_LOG_FINALIZED ||
> >                 Status == XRayLogInitStatus::XRAY_LOG_FINALIZING)) {
> > -    auto &TLD = getThreadLocalData();
> >      postCurrentThreadFCT(TLD);
> >      return;
> >    }
> >
> > -  auto &TLD = getThreadLocalData();
> > +  auto T = getThreadLocalData();
> > +  if (T == nullptr)
> > +    return;
> > +
> > +  auto FCT = reinterpret_cast<FunctionCallTrie *>(atomic_load_relaxed(&T->FCT));
> >    switch (Entry) {
> >    case XRayEntryType::ENTRY:
> >    case XRayEntryType::LOG_ARGS_ENTRY:
> > -    TLD.FCT->enterFunction(FuncId, TSC, CPU);
> > +    FCT->enterFunction(FuncId, TSC, CPU);
> >      break;
> >    case XRayEntryType::EXIT:
> >    case XRayEntryType::TAIL:
> > -    TLD.FCT->exitFunction(FuncId, TSC, CPU);
> > +    FCT->exitFunction(FuncId, TSC, CPU);
> >      break;
> >    default:
> >      // FIXME: Handle bugs.
> > @@ -227,15 +275,14 @@ XRayLogInitStatus profilingFinalize() XR
> >    // Wait a grace period to allow threads to see that we're finalizing.
> >    SleepForMillis(profilingFlags()->grace_period_ms);
> >
> > -  // We also want to make sure that the current thread's data is cleaned up, if
> > -  // we have any. We need to ensure that the call to postCurrentThreadFCT() is
> > -  // guarded by our recursion guard.
> > -  auto &TLD = getThreadLocalData();
> > -  {
> > -    RecursionGuard G(ReentranceGuard);
> > -    if (G)
> > -      postCurrentThreadFCT(TLD);
> > -  }
> > +  // If we for some reason are entering this function from an instrumented
> > +  // handler, we bail out.
> > +  RecursionGuard G(ReentranceGuard);
> > +  if (!G)
> > +    return static_cast<XRayLogInitStatus>(CurrentStatus);
> > +
> > +  // Post the current thread's data if we have any.
> > +  postCurrentThreadFCT(TLD);
> >
> >    // Then we force serialize the log data.
> >    profileCollectorService::serialize();
> > @@ -248,6 +295,10 @@ XRayLogInitStatus profilingFinalize() XR
> >  XRayLogInitStatus
> >  profilingLoggingInit(UNUSED size_t BufferSize, UNUSED size_t BufferMax,
> >                       void *Options, size_t OptionsSize) XRAY_NEVER_INSTRUMENT {
> > +  RecursionGuard G(ReentranceGuard);
> > +  if (!G)
> > +    return XRayLogInitStatus::XRAY_LOG_UNINITIALIZED;
> > +
> >    s32 CurrentStatus = XRayLogInitStatus::XRAY_LOG_UNINITIALIZED;
> >    if (!atomic_compare_exchange_strong(&ProfilerLogStatus, &CurrentStatus,
> >                                        XRayLogInitStatus::XRAY_LOG_INITIALIZING,
> > @@ -282,39 +333,51 @@ profilingLoggingInit(UNUSED size_t Buffe
> >
> >    // We need to set up the exit handlers.
> >    static pthread_once_t Once = PTHREAD_ONCE_INIT;
> > -  pthread_once(&Once, +[] {
> > -    pthread_key_create(&ProfilingKey, +[](void *P) {
> > -      // This is the thread-exit handler.
> > -      auto &TLD = *reinterpret_cast<ProfilingData *>(P);
> > -      if (TLD.Allocators == nullptr && TLD.FCT == nullptr)
> > -        return;
> > -
> > -      {
> > -        // If we're somehow executing this while inside a non-reentrant-friendly
> > -        // context, we skip attempting to post the current thread's data.
> > -        RecursionGuard G(ReentranceGuard);
> > -        if (G)
> > -          postCurrentThreadFCT(TLD);
> > -      }
> > -    });
> > -
> > -    // We also need to set up an exit handler, so that we can get the profile
> > -    // information at exit time. We use the C API to do this, to not rely on C++
> > -    // ABI functions for registering exit handlers.
> > -    Atexit(+[] {
> > -      // Finalize and flush.
> > -      if (profilingFinalize() != XRAY_LOG_FINALIZED) {
> > -        cleanupTLD();
> > -        return;
> > -      }
> > -      if (profilingFlush() != XRAY_LOG_FLUSHED) {
> > -        cleanupTLD();
> > -        return;
> > -      }
> > -      if (Verbosity())
> > -        Report("XRay Profile flushed at exit.");
> > -    });
> > -  });
> > +  pthread_once(
> > +      &Once, +[] {
> > +        pthread_key_create(
> > +            &ProfilingKey, +[](void *P) XRAY_NEVER_INSTRUMENT {
> > +              if (atomic_exchange(&ThreadExitingLatch, 1, memory_order_acq_rel))
> > +                return;
> > +
> > +              if (P == nullptr)
> > +                return;
> > +
> > +              auto T = reinterpret_cast<ProfilingData *>(P);
> > +              if (atomic_load_relaxed(&T->Allocators) == 0)
> > +                return;
> > +
> > +              {
> > +                // If we're somehow executing this while inside a
> > +                // non-reentrant-friendly context, we skip attempting to post
> > +                // the current thread's data.
> > +                RecursionGuard G(ReentranceGuard);
> > +                if (!G)
> > +                  return;
> > +
> > +                postCurrentThreadFCT(*T);
> > +              }
> > +            });
> > +
> > +        // We also need to set up an exit handler, so that we can get the
> > +        // profile information at exit time. We use the C API to do this, to not
> > +        // rely on C++ ABI functions for registering exit handlers.
> > +        Atexit(+[]() XRAY_NEVER_INSTRUMENT {
> > +          if (atomic_exchange(&ThreadExitingLatch, 1, memory_order_acq_rel))
> > +            return;
> > +
> > +          auto Cleanup =
> > +              at_scope_exit([]() XRAY_NEVER_INSTRUMENT { cleanupTLD(); });
> > +
> > +          // Finalize and flush.
> > +          if (profilingFinalize() != XRAY_LOG_FINALIZED ||
> > +              profilingFlush() != XRAY_LOG_FLUSHED)
> > +            return;
> > +
> > +          if (Verbosity())
> > +            Report("XRay Profile flushed at exit.");
> > +        });
> > +      });
> >
> >    __xray_log_set_buffer_iterator(profileCollectorService::nextBuffer);
> >    __xray_set_handler(profilingHandleArg0);
> >
> > Modified: compiler-rt/trunk/lib/xray/xray_segmented_array.h
> > URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/xray/xray_segmented_array.h?rev=348335&r1=348334&r2=348335&view=diff
> > ==============================================================================
> > --- compiler-rt/trunk/lib/xray/xray_segmented_array.h (original)
> > +++ compiler-rt/trunk/lib/xray/xray_segmented_array.h Tue Dec  4 22:44:34 2018
> > @@ -32,14 +32,9 @@ namespace __xray {
> >  /// is destroyed. When an Array is destroyed, it will destroy elements in the
> >  /// backing store but will not free the memory.
> >  template <class T> class Array {
> > -  struct SegmentBase {
> > -    SegmentBase *Prev;
> > -    SegmentBase *Next;
> > -  };
> > -
> > -  // We want each segment of the array to be cache-line aligned, and elements of
> > -  // the array be offset from the beginning of the segment.
> > -  struct Segment : SegmentBase {
> > +  struct Segment {
> > +    Segment *Prev;
> > +    Segment *Next;
> >      char Data[1];
> >    };
> >
> > @@ -62,91 +57,35 @@ public:
> >    //     kCacheLineSize-multiple segments, minus the size of two pointers.
> >    //
> >    //   - Request cacheline-multiple sized elements from the allocator.
> > -  static constexpr size_t AlignedElementStorageSize =
> > +  static constexpr uint64_t AlignedElementStorageSize =
> >        sizeof(typename std::aligned_storage<sizeof(T), alignof(T)>::type);
> >
> > -  static constexpr size_t SegmentSize =
> > -      nearest_boundary(sizeof(Segment) + next_pow2(sizeof(T)), kCacheLineSize);
> > +  static constexpr uint64_t SegmentControlBlockSize = sizeof(Segment *) * 2;
> > +
> > +  static constexpr uint64_t SegmentSize = nearest_boundary(
> > +      SegmentControlBlockSize + next_pow2(sizeof(T)), kCacheLineSize);
> >
> >    using AllocatorType = Allocator<SegmentSize>;
> >
> > -  static constexpr size_t ElementsPerSegment =
> > -      (SegmentSize - sizeof(Segment)) / next_pow2(sizeof(T));
> > +  static constexpr uint64_t ElementsPerSegment =
> > +      (SegmentSize - SegmentControlBlockSize) / next_pow2(sizeof(T));
> >
> >    static_assert(ElementsPerSegment > 0,
> >                  "Must have at least 1 element per segment.");
> >
> > -  static SegmentBase SentinelSegment;
> > +  static Segment SentinelSegment;
> >
> > -  using size_type = size_t;
> > +  using size_type = uint64_t;
> >
> >  private:
> > -  AllocatorType *Alloc;
> > -  SegmentBase *Head = &SentinelSegment;
> > -  SegmentBase *Tail = &SentinelSegment;
> > -  size_t Size = 0;
> > -
> > -  // Here we keep track of segments in the freelist, to allow us to re-use
> > -  // segments when elements are trimmed off the end.
> > -  SegmentBase *Freelist = &SentinelSegment;
> > -
> > -  Segment *NewSegment() XRAY_NEVER_INSTRUMENT {
> > -    // We need to handle the case in which enough elements have been trimmed to
> > -    // allow us to re-use segments we've allocated before. For this we look into
> > -    // the Freelist, to see whether we need to actually allocate new blocks or
> > -    // just re-use blocks we've already seen before.
> > -    if (Freelist != &SentinelSegment) {
> > -      auto *FreeSegment = Freelist;
> > -      Freelist = FreeSegment->Next;
> > -      FreeSegment->Next = &SentinelSegment;
> > -      Freelist->Prev = &SentinelSegment;
> > -      return static_cast<Segment *>(FreeSegment);
> > -    }
> > -
> > -    auto SegmentBlock = Alloc->Allocate();
> > -    if (SegmentBlock.Data == nullptr)
> > -      return nullptr;
> > -
> > -    // Placement-new the Segment element at the beginning of the SegmentBlock.
> > -    auto S = reinterpret_cast<Segment *>(SegmentBlock.Data);
> > -    new (S) SegmentBase{&SentinelSegment, &SentinelSegment};
> > -    return S;
> > -  }
> > -
> > -  Segment *InitHeadAndTail() XRAY_NEVER_INSTRUMENT {
> > -    DCHECK_EQ(Head, &SentinelSegment);
> > -    DCHECK_EQ(Tail, &SentinelSegment);
> > -    auto Segment = NewSegment();
> > -    if (Segment == nullptr)
> > -      return nullptr;
> > -    DCHECK_EQ(Segment->Next, &SentinelSegment);
> > -    DCHECK_EQ(Segment->Prev, &SentinelSegment);
> > -    Head = Tail = static_cast<SegmentBase *>(Segment);
> > -    return Segment;
> > -  }
> > -
> > -  Segment *AppendNewSegment() XRAY_NEVER_INSTRUMENT {
> > -    auto S = NewSegment();
> > -    if (S == nullptr)
> > -      return nullptr;
> > -    DCHECK_NE(Tail, &SentinelSegment);
> > -    DCHECK_EQ(Tail->Next, &SentinelSegment);
> > -    DCHECK_EQ(S->Prev, &SentinelSegment);
> > -    DCHECK_EQ(S->Next, &SentinelSegment);
> > -    Tail->Next = S;
> > -    S->Prev = Tail;
> > -    Tail = S;
> > -    return static_cast<Segment *>(Tail);
> > -  }
> > -
> >    // This Iterator models a BidirectionalIterator.
> >    template <class U> class Iterator {
> > -    SegmentBase *S = &SentinelSegment;
> > -    size_t Offset = 0;
> > -    size_t Size = 0;
> > +    Segment *S = &SentinelSegment;
> > +    uint64_t Offset = 0;
> > +    uint64_t Size = 0;
> >
> >    public:
> > -    Iterator(SegmentBase *IS, size_t Off, size_t S) XRAY_NEVER_INSTRUMENT
> > +    Iterator(Segment *IS, uint64_t Off, uint64_t S) XRAY_NEVER_INSTRUMENT
> >          : S(IS),
> >            Offset(Off),
> >            Size(S) {}
> > @@ -215,7 +154,7 @@ private:
> >
> >        // We need to compute the character-aligned pointer, offset from the
> >        // segment's Data location to get the element in the position of Offset.
> > -      auto Base = static_cast<Segment *>(S)->Data;
> > +      auto Base = &S->Data;
> >        auto AlignedOffset = Base + (RelOff * AlignedElementStorageSize);
> >        return *reinterpret_cast<U *>(AlignedOffset);
> >      }
> > @@ -223,17 +162,183 @@ private:
> >      U *operator->() const XRAY_NEVER_INSTRUMENT { return &(**this); }
> >    };
> >
> > +  AllocatorType *Alloc;
> > +  Segment *Head;
> > +  Segment *Tail;
> > +
> > +  // Here we keep track of segments in the freelist, to allow us to re-use
> > +  // segments when elements are trimmed off the end.
> > +  Segment *Freelist;
> > +  uint64_t Size;
> > +
> > +  // ===============================
> > +  // In the following implementation, we work through the algorithms and the
> > +  // list operations using the following notation:
> > +  //
> > +  //   - pred(s) is the predecessor (previous node accessor) and succ(s) is
> > +  //     the successor (next node accessor).
> > +  //
> > +  //   - S is a sentinel segment, which has the following property:
> > +  //
> > +  //         pred(S) == succ(S) == S
> > +  //
> > +  //   - @ is a loop operator, which can imply pred(s) == s if it appears on
> > +  //     the left of s, or succ(s) == S if it appears on the right of s.
> > +  //
> > +  //   - sL <-> sR : means a bidirectional relation between sL and sR, which
> > +  //     means:
> > +  //
> > +  //         succ(sL) == sR && pred(SR) == sL
> > +  //
> > +  //   - sL -> sR : implies a unidirectional relation between sL and SR,
> > +  //     with the following properties:
> > +  //
> > +  //         succ(sL) == sR
> > +  //
> > +  //     sL <- sR : implies a unidirectional relation between sR and sL,
> > +  //     with the following properties:
> > +  //
> > +  //         pred(sR) == sL
> > +  //
> > +  // ===============================
> > +
> > +  Segment *NewSegment() XRAY_NEVER_INSTRUMENT {
> > +    // We need to handle the case in which enough elements have been trimmed to
> > +    // allow us to re-use segments we've allocated before. For this we look into
> > +    // the Freelist, to see whether we need to actually allocate new blocks or
> > +    // just re-use blocks we've already seen before.
> > +    if (Freelist != &SentinelSegment) {
> > +      // The current state of lists resemble something like this at this point:
> > +      //
> > +      //   Freelist: @S@<-f0->...<->fN->@S@
> > +      //                  ^ Freelist
> > +      //
> > +      // We want to perform a splice of `f0` from Freelist to a temporary list,
> > +      // which looks like:
> > +      //
> > +      //   Templist: @S@<-f0->@S@
> > +      //                  ^ FreeSegment
> > +      //
> > +      // Our algorithm preconditions are:
> > +      DCHECK_EQ(Freelist->Prev, &SentinelSegment);
> > +
> > +      // Then the algorithm we implement is:
> > +      //
> > +      //   SFS = Freelist
> > +      //   Freelist = succ(Freelist)
> > +      //   if (Freelist != S)
> > +      //     pred(Freelist) = S
> > +      //   succ(SFS) = S
> > +      //   pred(SFS) = S
> > +      //
> > +      auto *FreeSegment = Freelist;
> > +      Freelist = Freelist->Next;
> > +
> > +      // Note that we need to handle the case where Freelist is now pointing to
> > +      // S, which we don't want to be overwriting.
> > +      // TODO: Determine whether the cost of the branch is higher than the cost
> > +      // of the blind assignment.
> > +      if (Freelist != &SentinelSegment)
> > +        Freelist->Prev = &SentinelSegment;
> > +
> > +      FreeSegment->Next = &SentinelSegment;
> > +      FreeSegment->Prev = &SentinelSegment;
> > +
> > +      // Our postconditions are:
> > +      DCHECK_EQ(Freelist->Prev, &SentinelSegment);
> > +      DCHECK_NE(FreeSegment, &SentinelSegment);
> > +      return FreeSegment;
> > +    }
> > +
> > +    auto SegmentBlock = Alloc->Allocate();
> > +    if (SegmentBlock.Data == nullptr)
> > +      return nullptr;
> > +
> > +    // Placement-new the Segment element at the beginning of the SegmentBlock.
> > +    new (SegmentBlock.Data) Segment{&SentinelSegment, &SentinelSegment, {0}};
> > +    auto SB = reinterpret_cast<Segment *>(SegmentBlock.Data);
> > +    return SB;
> > +  }
> > +
> > +  Segment *InitHeadAndTail() XRAY_NEVER_INSTRUMENT {
> > +    DCHECK_EQ(Head, &SentinelSegment);
> > +    DCHECK_EQ(Tail, &SentinelSegment);
> > +    auto S = NewSegment();
> > +    if (S == nullptr)
> > +      return nullptr;
> > +    DCHECK_EQ(S->Next, &SentinelSegment);
> > +    DCHECK_EQ(S->Prev, &SentinelSegment);
> > +    DCHECK_NE(S, &SentinelSegment);
> > +    Head = S;
> > +    Tail = S;
> > +    DCHECK_EQ(Head, Tail);
> > +    DCHECK_EQ(Tail->Next, &SentinelSegment);
> > +    DCHECK_EQ(Tail->Prev, &SentinelSegment);
> > +    return S;
> > +  }
> > +
> > +  Segment *AppendNewSegment() XRAY_NEVER_INSTRUMENT {
> > +    auto S = NewSegment();
> > +    if (S == nullptr)
> > +      return nullptr;
> > +    DCHECK_NE(Tail, &SentinelSegment);
> > +    DCHECK_EQ(Tail->Next, &SentinelSegment);
> > +    DCHECK_EQ(S->Prev, &SentinelSegment);
> > +    DCHECK_EQ(S->Next, &SentinelSegment);
> > +    S->Prev = Tail;
> > +    Tail->Next = S;
> > +    Tail = S;
> > +    DCHECK_EQ(S, S->Prev->Next);
> > +    DCHECK_EQ(Tail->Next, &SentinelSegment);
> > +    return S;
> > +  }
> > +
> >  public:
> > -  explicit Array(AllocatorType &A) XRAY_NEVER_INSTRUMENT : Alloc(&A) {}
> > +  explicit Array(AllocatorType &A) XRAY_NEVER_INSTRUMENT
> > +      : Alloc(&A),
> > +        Head(&SentinelSegment),
> > +        Tail(&SentinelSegment),
> > +        Freelist(&SentinelSegment),
> > +        Size(0) {}
> > +
> > +  Array() XRAY_NEVER_INSTRUMENT : Alloc(nullptr),
> > +                                  Head(&SentinelSegment),
> > +                                  Tail(&SentinelSegment),
> > +                                  Freelist(&SentinelSegment),
> > +                                  Size(0) {}
> >
> >    Array(const Array &) = delete;
> > -  Array(Array &&O) NOEXCEPT : Alloc(O.Alloc),
> > -                              Head(O.Head),
> > -                              Tail(O.Tail),
> > -                              Size(O.Size) {
> > +  Array &operator=(const Array &) = delete;
> > +
> > +  Array(Array &&O) XRAY_NEVER_INSTRUMENT : Alloc(O.Alloc),
> > +                                           Head(O.Head),
> > +                                           Tail(O.Tail),
> > +                                           Freelist(O.Freelist),
> > +                                           Size(O.Size) {
> > +    O.Alloc = nullptr;
> >      O.Head = &SentinelSegment;
> >      O.Tail = &SentinelSegment;
> >      O.Size = 0;
> > +    O.Freelist = &SentinelSegment;
> > +  }
> > +
> > +  Array &operator=(Array &&O) XRAY_NEVER_INSTRUMENT {
> > +    Alloc = O.Alloc;
> > +    O.Alloc = nullptr;
> > +    Head = O.Head;
> > +    O.Head = &SentinelSegment;
> > +    Tail = O.Tail;
> > +    O.Tail = &SentinelSegment;
> > +    Freelist = O.Freelist;
> > +    O.Freelist = &SentinelSegment;
> > +    Size = O.Size;
> > +    O.Size = 0;
> > +    return *this;
> > +  }
> > +
> > +  ~Array() XRAY_NEVER_INSTRUMENT {
> > +    for (auto &E : *this)
> > +      (&E)->~T();
> >    }
> >
> >    bool empty() const XRAY_NEVER_INSTRUMENT { return Size == 0; }
> > @@ -243,52 +348,41 @@ public:
> >      return *Alloc;
> >    }
> >
> > -  size_t size() const XRAY_NEVER_INSTRUMENT { return Size; }
> > -
> > -  T *Append(const T &E) XRAY_NEVER_INSTRUMENT {
> > -    if (UNLIKELY(Head == &SentinelSegment))
> > -      if (InitHeadAndTail() == nullptr)
> > -        return nullptr;
> > -
> > -    auto Offset = Size % ElementsPerSegment;
> > -    if (UNLIKELY(Size != 0 && Offset == 0))
> > -      if (AppendNewSegment() == nullptr)
> > -        return nullptr;
> > -
> > -    auto Base = static_cast<Segment *>(Tail)->Data;
> > -    auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
> > -    auto Position = reinterpret_cast<T *>(AlignedOffset);
> > -    *Position = E;
> > -    ++Size;
> > -    return Position;
> > -  }
> > +  uint64_t size() const XRAY_NEVER_INSTRUMENT { return Size; }
> >
> >    template <class... Args>
> >    T *AppendEmplace(Args &&... args) XRAY_NEVER_INSTRUMENT {
> > -    if (UNLIKELY(Head == &SentinelSegment))
> > -      if (InitHeadAndTail() == nullptr)
> > +    DCHECK((Size == 0 && Head == &SentinelSegment && Head == Tail) ||
> > +           (Size != 0 && Head != &SentinelSegment && Tail != &SentinelSegment));
> > +    if (UNLIKELY(Head == &SentinelSegment)) {
> > +      auto R = InitHeadAndTail();
> > +      if (R == nullptr)
> >          return nullptr;
> > +    }
> > +
> > +    DCHECK_NE(Head, &SentinelSegment);
> > +    DCHECK_NE(Tail, &SentinelSegment);
> >
> >      auto Offset = Size % ElementsPerSegment;
> > -    auto *LatestSegment = Tail;
> > -    if (UNLIKELY(Size != 0 && Offset == 0)) {
> > -      LatestSegment = AppendNewSegment();
> > -      if (LatestSegment == nullptr)
> > +    if (UNLIKELY(Size != 0 && Offset == 0))
> > +      if (AppendNewSegment() == nullptr)
> >          return nullptr;
> > -    }
> >
> >      DCHECK_NE(Tail, &SentinelSegment);
> > -    auto Base = static_cast<Segment *>(LatestSegment)->Data;
> > +    auto Base = &Tail->Data;
> >      auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
> > -    auto Position = reinterpret_cast<T *>(AlignedOffset);
> > +    DCHECK_LE(AlignedOffset + sizeof(T),
> > +              reinterpret_cast<unsigned char *>(Tail) + SegmentSize);
> >
> >      // In-place construct at Position.
> > -    new (Position) T{std::forward<Args>(args)...};
> > +    new (AlignedOffset) T{std::forward<Args>(args)...};
> >      ++Size;
> > -    return reinterpret_cast<T *>(Position);
> > +    return reinterpret_cast<T *>(AlignedOffset);
> >    }
> >
> > -  T &operator[](size_t Offset) const XRAY_NEVER_INSTRUMENT {
> > +  T *Append(const T &E) XRAY_NEVER_INSTRUMENT { return AppendEmplace(E); }
> > +
> > +  T &operator[](uint64_t Offset) const XRAY_NEVER_INSTRUMENT {
> >      DCHECK_LE(Offset, Size);
> >      // We need to traverse the array enough times to find the element at Offset.
> >      auto S = Head;
> > @@ -297,7 +391,7 @@ public:
> >        Offset -= ElementsPerSegment;
> >        DCHECK_NE(S, &SentinelSegment);
> >      }
> > -    auto Base = static_cast<Segment *>(S)->Data;
> > +    auto Base = &S->Data;
> >      auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
> >      auto Position = reinterpret_cast<T *>(AlignedOffset);
> >      return *reinterpret_cast<T *>(Position);
> > @@ -332,41 +426,172 @@ public:
> >
> >    /// Remove N Elements from the end. This leaves the blocks behind, and not
> >    /// require allocation of new blocks for new elements added after trimming.
> > -  void trim(size_t Elements) XRAY_NEVER_INSTRUMENT {
> > -    if (Elements == 0)
> > -      return;
> > -
> > +  void trim(uint64_t Elements) XRAY_NEVER_INSTRUMENT {
> >      auto OldSize = Size;
> > -    Elements = Elements >= Size ? Size : Elements;
> > +    Elements = Elements > Size ? Size : Elements;
> >      Size -= Elements;
> >
> > -    DCHECK_NE(Head, &SentinelSegment);
> > -    DCHECK_NE(Tail, &SentinelSegment);
> > -
> > -    for (auto SegmentsToTrim = (nearest_boundary(OldSize, ElementsPerSegment) -
> > -                                nearest_boundary(Size, ElementsPerSegment)) /
> > -                               ElementsPerSegment;
> > -         SegmentsToTrim > 0; --SegmentsToTrim) {
> > -
> > -      // We want to short-circuit if the trace is already empty.
> > -      if (Head == &SentinelSegment && Head == Tail)
> > -        return;
> > -
> > -      // Put the tail into the Freelist.
> > -      auto *FreeSegment = Tail;
> > -      Tail = Tail->Prev;
> > -      if (Tail == &SentinelSegment)
> > -        Head = Tail;
> > -      else
> > -        Tail->Next = &SentinelSegment;
> > -
> > +    // We compute the number of segments we're going to return from the tail by
> > +    // counting how many elements have been trimmed. Given the following:
> > +    //
> > +    // - Each segment has N valid positions, where N > 0
> > +    // - The previous size > current size
> > +    //
> > +    // To compute the number of segments to return, we need to perform the
> > +    // following calculations for the number of segments required given 'x'
> > +    // elements:
> > +    //
> > +    //   f(x) = {
> > +    //            x == 0          : 0
> > +    //          , 0 < x <= N      : 1
> > +    //          , N < x <= max    : x / N + (x % N ? 1 : 0)
> > +    //          }
> > +    //
> > +    // We can simplify this down to:
> > +    //
> > +    //   f(x) = {
> > +    //            x == 0          : 0,
> > +    //          , 0 < x <= max    : x / N + (x < N || x % N ? 1 : 0)
> > +    //          }
> > +    //
> > +    // And further down to:
> > +    //
> > +    //   f(x) = x ? x / N + (x < N || x % N ? 1 : 0) : 0
> > +    //
> > +    // We can then perform the following calculation `s` which counts the number
> > +    // of segments we need to remove from the end of the data structure:
> > +    //
> > +    //   s(p, c) = f(p) - f(c)
> > +    //
> > +    // If we treat p = previous size, and c = current size, and given the
> > +    // properties above, the possible range for s(...) is [0..max(typeof(p))/N]
> > +    // given that typeof(p) == typeof(c).
> > +    auto F = [](uint64_t X) {
> > +      return X ? (X / ElementsPerSegment) +
> > +                     (X < ElementsPerSegment || X % ElementsPerSegment ? 1 : 0)
> > +               : 0;
> > +    };
> > +    auto PS = F(OldSize);
> > +    auto CS = F(Size);
> > +    DCHECK_GE(PS, CS);
> > +    auto SegmentsToTrim = PS - CS;
> > +    for (auto I = 0uL; I < SegmentsToTrim; ++I) {
> > +      // Here we place the current tail segment to the freelist. To do this
> > +      // appropriately, we need to perform a splice operation on two
> > +      // bidirectional linked-lists. In particular, we have the current state of
> > +      // the doubly-linked list of segments:
> > +      //
> > +      //   @S@ <- s0 <-> s1 <-> ... <-> sT -> @S@
> > +      //
> > +      DCHECK_NE(Head, &SentinelSegment);
> > +      DCHECK_NE(Tail, &SentinelSegment);
> >        DCHECK_EQ(Tail->Next, &SentinelSegment);
> > -      FreeSegment->Next = Freelist;
> > -      FreeSegment->Prev = &SentinelSegment;
> > -      if (Freelist != &SentinelSegment)
> > -        Freelist->Prev = FreeSegment;
> > -      Freelist = FreeSegment;
> > +
> > +      if (Freelist == &SentinelSegment) {
> > +        // Our two lists at this point are in this configuration:
> > +        //
> > +        //   Freelist: (potentially) @S@
> > +        //   Mainlist: @S@<-s0<->s1<->...<->sPT<->sT->@S@
> > +        //                  ^ Head                ^ Tail
> > +        //
> > +        // The end state for us will be this configuration:
> > +        //
> > +        //   Freelist: @S@<-sT->@S@
> > +        //   Mainlist: @S@<-s0<->s1<->...<->sPT->@S@
> > +        //                  ^ Head          ^ Tail
> > +        //
> > +        // The first step for us is to hold a reference to the tail of Mainlist,
> > +        // which in our notation is represented by sT. We call this our "free
> > +        // segment" which is the segment we are placing on the Freelist.
> > +        //
> > +        //   sF = sT
> > +        //
> > +        // Then, we also hold a reference to the "pre-tail" element, which we
> > +        // call sPT:
> > +        //
> > +        //   sPT = pred(sT)
> > +        //
> > +        // We want to splice sT into the beginning of the Freelist, which in
> > +        // an empty Freelist means placing a segment whose predecessor and
> > +        // successor is the sentinel segment.
> > +        //
> > +        // The splice operation then can be performed in the following
> > +        // algorithm:
> > +        //
> > +        //   succ(sPT) = S
> > +        //   pred(sT) = S
> > +        //   succ(sT) = Freelist
> > +        //   Freelist = sT
> > +        //   Tail = sPT
> > +        //
> > +        auto SPT = Tail->Prev;
> > +        SPT->Next = &SentinelSegment;
> > +        Tail->Prev = &SentinelSegment;
> > +        Tail->Next = Freelist;
> > +        Freelist = Tail;
> > +        Tail = SPT;
> > +
> > +        // Our post-conditions here are:
> > +        DCHECK_EQ(Tail->Next, &SentinelSegment);
> > +        DCHECK_EQ(Freelist->Prev, &SentinelSegment);
> > +      } else {
> > +        // In the other case, where the Freelist is not empty, we perform the
> > +        // following transformation instead:
> > +        //
> > +        // This transforms the current state:
> > +        //
> > +        //   Freelist: @S@<-f0->@S@
> > +        //                  ^ Freelist
> > +        //   Mainlist: @S@<-s0<->s1<->...<->sPT<->sT->@S@
> > +        //                  ^ Head                ^ Tail
> > +        //
> > +        // Into the following:
> > +        //
> > +        //   Freelist: @S@<-sT<->f0->@S@
> > +        //                  ^ Freelist
> > +        //   Mainlist: @S@<-s0<->s1<->...<->sPT->@S@
> > +        //                  ^ Head          ^ Tail
> > +        //
> > +        // The algorithm is:
> > +        //
> > +        //   sFH = Freelist
> > +        //   sPT = pred(sT)
> > +        //   pred(SFH) = sT
> > +        //   succ(sT) = Freelist
> > +        //   pred(sT) = S
> > +        //   succ(sPT) = S
> > +        //   Tail = sPT
> > +        //   Freelist = sT
> > +        //
> > +        auto SFH = Freelist;
> > +        auto SPT = Tail->Prev;
> > +        auto ST = Tail;
> > +        SFH->Prev = ST;
> > +        ST->Next = Freelist;
> > +        ST->Prev = &SentinelSegment;
> > +        SPT->Next = &SentinelSegment;
> > +        Tail = SPT;
> > +        Freelist = ST;
> > +
> > +        // Our post-conditions here are:
> > +        DCHECK_EQ(Tail->Next, &SentinelSegment);
> > +        DCHECK_EQ(Freelist->Prev, &SentinelSegment);
> > +        DCHECK_EQ(Freelist->Next->Prev, Freelist);
> > +      }
> >      }
> > +
> > +    // Now in case we've spliced all the segments in the end, we ensure that the
> > +    // main list is "empty", or both the head and tail pointing to the sentinel
> > +    // segment.
> > +    if (Tail == &SentinelSegment)
> > +      Head = Tail;
> > +
> > +    DCHECK(
> > +        (Size == 0 && Head == &SentinelSegment && Tail == &SentinelSegment) ||
> > +        (Size != 0 && Head != &SentinelSegment && Tail != &SentinelSegment));
> > +    DCHECK(
> > +        (Freelist != &SentinelSegment && Freelist->Prev == &SentinelSegment) ||
> > +        (Freelist == &SentinelSegment && Tail->Next == &SentinelSegment));
> >    }
> >
> >    // Provide iterators.
> > @@ -388,8 +613,8 @@ public:
> >  // ensure that storage for the SentinelSegment is defined and has a single
> >  // address.
> >  template <class T>
> > -typename Array<T>::SegmentBase Array<T>::SentinelSegment{
> > -    &Array<T>::SentinelSegment, &Array<T>::SentinelSegment};
> > +typename Array<T>::Segment Array<T>::SentinelSegment{
> > +    &Array<T>::SentinelSegment, &Array<T>::SentinelSegment, {'\0'}};
> >
> >  } // namespace __xray
> >
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits


More information about the llvm-commits mailing list