[llvm-dev] [cfe-dev] Identifying objects within BumpPtrAllocator.

Artem Dergachev via llvm-dev llvm-dev at lists.llvm.org
Wed Aug 29 12:21:54 PDT 2018


Yup, I use that as well. I guess we can print both the pointer and the 
stable identifier alongside each other. Or we could add a flag to choose 
between those, but that's less comfy.

On 8/29/18 11:54 AM, David Blaikie wrote:
> Mostly what Richard said.
>
> One thing I'd be a bit careful of - these numbers may still not be 
> stable in some small number of cases (eg: if objects are created based 
> on iteration order of a pointer-based hashing container - which may 
> still be valid if that ordering doesn't leak into the output of the 
> program). So this might provide a slightly false sense of security & 
> make those minority cases more painful - but perhaps they're rare 
> enough that it's worth the tradeoff.
>
> (& as Richard said - debuggers will tend to disable ASLR anyway, 
> making it relatively easy to work with)
>
> On Tue, Aug 28, 2018 at 6:16 PM Richard Smith via cfe-dev 
> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>
>     On Tue, 28 Aug 2018, 17:14 Artem Dergachev via cfe-dev,
>     <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>
>         In various debug dumps (eg., Clang's -ast-dump), various
>         objects (eg.,
>         Stmts and Decls in that -ast-dump) are identified by pointers.
>         It's very
>         reliable in the sense that no two objects would ever have the
>         same
>         pointer at the same time, but it's unpleasant that pointers
>         change
>         across runs. Having deterministic identifiers instead of
>         pointers would
>         aid debugging: imagine a conditional break by object
>         identifier that has
>         not yet been constructed, or simply trying to align two debug
>         dumps of
>         different kind from different runs together. Additionally,
>         pointers are
>         hard to read and memorize; it's hard to notice the difference
>         between
>         0x7f80a28325e0 and 0x7f80a28325a0, especially when they're a
>         few screens
>         apart.
>
>         Hence the idea: why don't we print the offset into the
>         allocator's
>         memory slab instead of a pointer?
>
>
>     Make this "as well as" rather than "instead of" and it sounds
>     great to me. When debugging, it's useful to be able to dump a
>     large complex object, find the piece you want, grab its address
>     and start accessing it directly.
>
>     (For the pointer stability problem, at least on Linux you can turn
>     off ASLR. When running under gdb, that's typically done for you,
>     and you can do it manually with setarch. But it would be nice to
>     have an easier way to identify objects than a long, essentially
>     meaningless address.)
>
>         We use BumpPtrAllocator all over the
>         place, which boils down to a set of slabs on which all objects
>         are
>         placed in the order in which they are allocated. It is easy
>         for the
>         allocator to identify if a pointer belongs to that allocator,
>         and if so,
>         deteremine which slab it belongs to and at what offset the
>         object is in
>         that slab. Therefore it is possible to identify the object by
>         its (slab
>         index, offset) pair. Eg., "TypedefDecl 0:528" (you already
>         memorized it)
>         instead of "TypedefDecl 0x7f80a28325e0". This could be applied
>         to all
>         sorts of objects that live in BumpPtrAllocators.
>
>         In order to compute such identifier, we only need access to
>         the object
>         and to the allocator. No additional memory is used to store such
>         identifier. Such identifier would also be persistent across
>         runs as long
>         as the same objects are allocated in the same order, which is, i
>         suspect, often the case.
>
>         One of the downsides of this identifier is that it's not going
>         to be the
>         same on different machines, because the same data structure
>         may require
>         different amounts of memory on different hosts. So it wouldn't
>         necessarily help understanding a dump that the user sent you.
>         But it
>         still seems to be better than pointers.
>
>         Should we go ahead and try to implement it?
>         _______________________________________________
>         cfe-dev mailing list
>         cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>         http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>     _______________________________________________
>     cfe-dev mailing list
>     cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180829/6950d21b/attachment.html>


More information about the llvm-dev mailing list