[lldb-dev] LLDB Evolution - Final Form

Greg Clayton via lldb-dev lldb-dev at lists.llvm.org
Mon Sep 19 14:37:54 PDT 2016


> On Sep 19, 2016, at 2:24 PM, Zachary Turner <zturner at google.com> wrote:
> 
> 
> 
> On Mon, Sep 19, 2016 at 2:02 PM Greg Clayton <gclayton at apple.com> wrote:
> 
> > On Sep 19, 2016, at 1:18 PM, Zachary Turner via lldb-dev <lldb-dev at lists.llvm.org> wrote:
> >
> > Following up with Kate's post from a few weeks ago, I think the dust has settled on the code reformat and it went over pretty smoothly for the most part.  So I thought it might be worth throwing out some ideas for where we go from here.  I have a large list of ideas (more ideas than time, sadly) that I've been collecting over the past few weeks, so I figured I would throw them out in the open for discussion.
> >
> > I’ve grouped the areas for improvement into 3 high level categories.
> >
> >       • De-inventing the wheel - We should use more code from LLVM, and delete code in LLDB where LLVM provides a solution.  In cases where there is an LLVM thing that is *similar* to what we need, we should extend the LLVM thing to support what we need, and then use it.  Following are some areas I've identified.  This list is by no means complete.  For each one, I've given a personal assessment of how likely it is to cause some (temporary) hiccups, how much it would help us in the long run, and how difficult it would be to do.  Without further ado:
> >               • Use llvm::Regex instead of lldb::Regex
> >                       • llvm::Regex doesn’t support enhanced mode.  Could we add support for this to llvm::Regex?
> >                       • Risk: 6
> >                       • Impact: 3
> >                       • Difficulty / Effort: 3  (5 if we have to add enhanced mode support)
> 
> As long as it supports all of the features, then this is fine. Otherwise we will need to modify the regular expressions to work with the LLVM version or back port any advances regex features we need into the LLVM regex parser.
> 
> >               • Use llvm streams instead of lldb::StreamString
> >                       • Supports output re-targeting (stderr, stdout, std::string, etc), printf style formatting, and type-safe streaming operators.
> >                       • Interoperates nicely with many existing llvm utility classes
> >                       • Risk: 4
> >                       • Impact: 5
> >                       • Difficulty / Effort: 7
> 
> I have worked extensively with both C++ streams and with printf. I don't find the type safe streaming operators to be readable and a great way to place things into streams. Part of my objection to this will be quelled if the LLVM streams are not stateful like C++ streams are (add an extra "std::hex" into the stream and suddenly other things down stream start coming out in hex). As long as printf functionality is in the LLVM streams I am ok with it.
> I agree with you on the streams for the most part.  llvm streams do not use stateful manipulators, although they do have stateless manipulators.  For example, you can write:
> 
> stream << llvm::format_hex(n);
> 
> and that would print n as hex, but it wouldn't affect the printing of subsequent items.  You also still get printf style formatting by using the llvm::format stateless manipulator.  Like this:
> 
> stream << llvm::format("%0.4f", myfloat)
>  
> 
> >               • Use llvm::Error instead of lldb::Error
> >                       • llvm::Error is an error class that *requires* you to check whether it succeeded or it will assert.  In a way, it's similar to a C++ exception, except that it doesn't come with the performance hit associated with exceptions.  It's extensible, and can be easily extended to support the various ways LLDB needs to construct errors and error messages.
> >                       • Would need to first rename lldb::Error to LLDBError so that te conversion from LLDBError to llvm::Error could be done incrementally.
> >                       • Risk: 7
> >                       • Impact: 7
> >                       • Difficulty / Effort: 8
> 
> We must be very careful here. Nothing can crash LLDB and adding something that will be known to crash in some cases can only be changed if there is a test that can guarantee that it won't crash. assert() calls in a shared library like LLDB are not OK and shouldn't fire. Even if they do, when asserts are off, then it shouldn't crash LLDB. We have made efforts to only use asserts in LLDB for things that really can't happen, but for things like constructing a StringRef with NULL is one example of why I don't want to use certain classes in LLVM. If we can remove the crash threat, then lets use them. But many people firmly believe that asserts belong in llvm and clang code and I don't agree. Also many asserts are in header files so even if you build llvm and clang with asserts off, but then build our project that uses LLVM with asserts on, you get LLVM and clang asserts when you don't want them.
> We can probably add something to llvm::Error to make it not assert but to do something else instead.  Something that would be up to the person raising the error, so we could say that all lldb errors wouldn't cause a problem.
> 
> As an aside, have you guys ever talked about the idea of using out of process isolation so it doesn't bring down all of xcode when it crashes?  Obviously it's a ton of work, but it would solve these kidns of problems.

Yep. Xcode 8 does this just for this reason. More details on that later.

> 
> >               • StringRef instead of const char *, len everywhere
> >                       • Can do most common string operations in a way that is guaranteed to be safe.
> >                       • Reduces string manipulation algorithm complexity by an order of magnitude.
> >                       • Can potentially eliminate tens of thousands of string copies across the codebase.
> >                       • Simplifies code.
> >                       • Risk: 3
> >                       • Impact: 8
> >                       • Difficulty / Effort: 7
> 
> I don't think StringRef needs to be used everywhere, but it many places it does make sense. For example our public API should not contain any LLVM classes in the API. "const char *" is a fine parameter for public functions. I really hate the fact that constructing llvm::StringRef with NULL causes an assertion. Many people don't know that and don't take that into account when making their changes. I would love to see the assertion taken out to tell you the truth. That would make me feel better about using StringRef in many more places within LLDB as we shouldn't ever crash due to this. I would expect most internal APIs are safe to move to llvm::StringRef, but we will need to make sure none of these require a null terminated string which would cause us to have to call "str.str().c_str()". So internally, yes we can cut over to more use of StringRef. But the public API can't be changed.
> Definitely no changing the public API, I agree with you there.  A lot of times where we currently "need" a null terminated string, that's only because the null terminated string is being passed to a C api which has a StringRef counterpart.  Not always, but often.  I think when you have StringRefs all the way down, the problem of constructing it with a null pointer largely goes away because you don't have that many pointers left to begin with.
>  
> 
> 
> >               • ArrayRef instead of const void *, len everywhere
> >                       • Same analysis as StringRef
> 
> Internally yes, external APIs no.
> >               • MutableArrayRef instead of void *, len everywhere
> >                       • Same analysis as StringRef
> Internally yes, external APIs no.
> >               • Delete ConstString, use a modified StringPool that is thread-safe.
> >                       • StringPool is a non thread-safe version of ConstString.
> >                       • Strings are internally refcounted so they can be cleaned up when they are no longer used.  ConstStrings are a large source of memory in LLDB, so ref-counting and removing stale strings has the potential to be a huge savings.
> >                       • Risk: 2
> >                       • Impact: 9
> >                       • Difficulty / Effort: 6
> 
> We can't currently rely on ref counting. We currently hand out "const char *" as return values from many public API calls and these rely on the strings living forever. We many many existing clients and we can't change the public API. So I vote to avoid this. We are already using LLVM string pools under the covers and we also associate the mangled/demangled names in the map as is saves us a ton of cycles when parsing stuff as we never demangle (very expensive) the same name twice. So I don't see the need to change this as it is already backed by LLVM technology under the covers.

> Just thinking out loud here, but one possibility is to have multiple pools.  One which is ref counted and one which isn't.  If you need something to live forever, vend it from the non-ref-counted pool.  Otherwise vend it from the ref counted pool.

Again, since we already are using LLVM under the hood here, I would hope to avoid changes if possible.



More information about the lldb-dev mailing list