[compiler-rt] [compiler-rt][ctx_profile] Add the instrumented contextual profiling APIs (PR #89838)
Snehasish Kumar via llvm-commits
llvm-commits at lists.llvm.org
Mon May 6 15:46:46 PDT 2024
================
@@ -51,5 +53,189 @@ class Arena final {
const uint64_t Size;
};
+/// The contextual profile is a directed tree where each node has one parent. A
+/// node (ContextNode) corresponds to a function activation. The root of the
+/// tree is at a function that was marked as entrypoint to the compiler. A node
+/// stores counter values for edges and a vector of subcontexts. These are the
+/// contexts of callees. The index in the subcontext vector corresponds to the
+/// index of the callsite (as was instrumented via llvm.instrprof.callsite). At
+/// that index we find a linked list, potentially empty, of ContextNodes. Direct
+/// calls will have 0 or 1 values in the linked list, but indirect callsites may
+/// have more.
+///
+/// The ContextNode has a fixed sized header describing it - the GUID of the
+/// function, the size of the counter and callsite vectors. It is also an
+/// (intrusive) linked list for the purposes of the indirect call case above.
+///
+/// Allocation is expected to happen on an Arena. The allocation lays out inline
+/// the counter and subcontexts vectors. The class offers APIs to correctly
+/// reference the latter.
+///
+/// The layout is as follows:
+///
+/// [[statically-declared fields][counters vector][subcontexts vector]]
----------------
snehasish wrote:
I think interpreting a subcontext ContextNode as part of the same block of memory immediately following counters of width uint64_t is undefined behaviour. In this case I think we get lucky because the first element of the ContextNode struct is a GUID (uint64_t) and thus the alignment of the ContextNode struct and the alignment of each individual counter matches.
I am far from an expert on undefined behaviour and my understanding is based on reading (5) in https://en.cppreference.com/w/cpp/language/reinterpret_cast and https://en.cppreference.com/w/c/language/object. Please feel free to correct me if you think otherwise.
I think the motivation is that the single chunk based layout simplifies the instrumentation GEP computations. A few questions:
1) Can we test the ContextNode runtime implementation with sanitizers to expose any alignment issues?
2) Have you considered an alternative design which does not make it easy to run into such issues?
https://github.com/llvm/llvm-project/pull/89838
More information about the llvm-commits
mailing list