[llvm-dev] Identifying objects within BumpPtrAllocator.

Tue Aug 28 17:14:48 PDT 2018

In various debug dumps (eg., Clang's -ast-dump), various objects (eg., 
Stmts and Decls in that -ast-dump) are identified by pointers. It's very 
reliable in the sense that no two objects would ever have the same 
pointer at the same time, but it's unpleasant that pointers change 
across runs. Having deterministic identifiers instead of pointers would 
aid debugging: imagine a conditional break by object identifier that has 
not yet been constructed, or simply trying to align two debug dumps of 
different kind from different runs together. Additionally, pointers are 
hard to read and memorize; it's hard to notice the difference between 
0x7f80a28325e0 and 0x7f80a28325a0, especially when they're a few screens 
apart.

Hence the idea: why don't we print the offset into the allocator's 
memory slab instead of a pointer? We use BumpPtrAllocator all over the 
place, which boils down to a set of slabs on which all objects are 
placed in the order in which they are allocated. It is easy for the 
allocator to identify if a pointer belongs to that allocator, and if so, 
deteremine which slab it belongs to and at what offset the object is in 
that slab. Therefore it is possible to identify the object by its (slab 
index, offset) pair. Eg., "TypedefDecl 0:528" (you already memorized it) 
instead of "TypedefDecl 0x7f80a28325e0". This could be applied to all 
sorts of objects that live in BumpPtrAllocators.

In order to compute such identifier, we only need access to the object 
and to the allocator. No additional memory is used to store such 
identifier. Such identifier would also be persistent across runs as long 
as the same objects are allocated in the same order, which is, i 
suspect, often the case.

One of the downsides of this identifier is that it's not going to be the 
same on different machines, because the same data structure may require 
different amounts of memory on different hosts. So it wouldn't 
necessarily help understanding a dump that the user sent you. But it 
still seems to be better than pointers.

Should we go ahead and try to implement it?