[llvm-dev] Identifying objects within BumpPtrAllocator.
Artem Dergachev via llvm-dev
llvm-dev at lists.llvm.org
Tue Aug 28 17:14:48 PDT 2018
In various debug dumps (eg., Clang's -ast-dump), various objects (eg.,
Stmts and Decls in that -ast-dump) are identified by pointers. It's very
reliable in the sense that no two objects would ever have the same
pointer at the same time, but it's unpleasant that pointers change
across runs. Having deterministic identifiers instead of pointers would
aid debugging: imagine a conditional break by object identifier that has
not yet been constructed, or simply trying to align two debug dumps of
different kind from different runs together. Additionally, pointers are
hard to read and memorize; it's hard to notice the difference between
0x7f80a28325e0 and 0x7f80a28325a0, especially when they're a few screens
apart.
Hence the idea: why don't we print the offset into the allocator's
memory slab instead of a pointer? We use BumpPtrAllocator all over the
place, which boils down to a set of slabs on which all objects are
placed in the order in which they are allocated. It is easy for the
allocator to identify if a pointer belongs to that allocator, and if so,
deteremine which slab it belongs to and at what offset the object is in
that slab. Therefore it is possible to identify the object by its (slab
index, offset) pair. Eg., "TypedefDecl 0:528" (you already memorized it)
instead of "TypedefDecl 0x7f80a28325e0". This could be applied to all
sorts of objects that live in BumpPtrAllocators.
In order to compute such identifier, we only need access to the object
and to the allocator. No additional memory is used to store such
identifier. Such identifier would also be persistent across runs as long
as the same objects are allocated in the same order, which is, i
suspect, often the case.
One of the downsides of this identifier is that it's not going to be the
same on different machines, because the same data structure may require
different amounts of memory on different hosts. So it wouldn't
necessarily help understanding a dump that the user sent you. But it
still seems to be better than pointers.
Should we go ahead and try to implement it?
More information about the llvm-dev
mailing list