[Lldb-commits] [lldb] r281273 - This is the main part of a change to add breakpoint save and restore to lldb.

Mon Sep 12 19:11:16 PDT 2016

The StringRef does in fact store a length.  So if you write this:

std::string S("This is a test");
StringRef R(S);

Then R internally contains
const char *Ptr = "This is a test"
size_t Len = 14

This makes repeated operations on StringRefs really fast (and efficient),
because there's no copying involved.  For example, you could say:

R.consume_front("This");
R.consume_back("test");
R = R.trim();

And you would end up with R = "is a".  In this case, The internal ptr would
refer to the original pointer + 5, and the length would now be 4.

If you have two StringRefs, R and R2, and you say "if (R == R2)", then
since both of them store their length, then it's a straight memcmp since
both lengths are already known in advance.

There are lots of useful operations that if you don't have a StringRef, you
won't ever think to use, but once you start using them, you'll never want
to go back.  That's why I say we should always prefer StringRef unless
there is a specific reason not to.  It doesn't prevent you from doing
anything that you can already do with a const char*, but it allows you to
do much more.  The converse is not true.  If you vend out const char*'s,
you will end up with a lot of excessively verbose code, because as soon as
someone sees a const char*, they will start looking for the appropriate C
method to use to solve their problem, which is probably going to turn into
a lot of difficult to read / get correct code, when there's a good chance
that there was already a method that did exactly what they wanted in
StringRef and would have saved a ton of code.

One example that I found recently was there was a string formatted as a ;
separated list.  Each semicolon was followed by some prefix indicating what
followed.  One of those fields was itself a nested comma separated list.
So it was something like this:

std::string s = Field1=...;Field2=...;Field3=value1,value2,value3;Field4=...

The code to parse this was more than one screen.  Converting it to
StringRef and using StringRef::split() it became about 10 lines.

I agree you don't always "need" it, but you never know how people are going
to use it, and I can't think of a reason *not* to use it.  And when you do
need it, it makes code drastically simpler.

On Mon, Sep 12, 2016 at 6:24 PM Jim Ingham <jingham at apple.com> wrote:

> I see the whole content, but I'll reply to this one so the reply doesn't
> get truncated on your end...
>
> > On Sep 12, 2016, at 6:03 PM, Zachary Turner <zturner at google.com> wrote:
> >
> >
> > Immediately, very little.  A small amount of performance, since
> comparing StringRefs is faster than comparing null terminated stings, since
> the length is stored with the string it can use memcmp instead of strcmp.
> >
> > From a big picture perspective, quite a lot IMO.  StringRef has numerous
> methods which are extremely useful and not easy to reproduce correctly.
> Also, If you do the comparison many times, you only pay for the length
> computation once.
> >
> > Also, this email is so long that it's truncating in my email reader.  So
> I'm not sure if the second part of my email went through.  It's possible
> your reader is better than mine at handling large emails, but I'm copying
> the answer to your second question below just in case you also didn't see
> it.
> >
> > Thanks for the comments.
> > >
> > >   I don't see the benefit of using StringRef's to return all the key
> names.  I'm generally only ever going to pass them to the StructuredData
> API's, which makes them into StringRef's transparently on the way in.  How
> would building StringRefs help here?
> >
> > >   You also suggested changing a bunch of BreakpointOption API to
> return StringRef's.  OTOH this CL just mechanically changed from m_options
> to m_options_up, so changing the API was not part of the patches intent.
> OTOH most of these options (condition, thread name, etc) can take and
> return NULL strings.  I didn't think StringRef's handled null strings well,
> was I wrong about that?  And again, what would I gain by making these
> StringRef's?  I'm just going to stash them away in some storage the object
> owns, and I'm not going to ever share the storage with the caller.  So
> least common denominator (const char *) seems a better choice here.  If the
> caller wants a StringRef they are welcome to make one.
> >
> > Right, but making the StringRef incurs a length computation.  That's not
> something you want to do over and over.  It's guaranteed someone somewhere
> is going to compute the length, so it's better to do it once upfront, and
> allow everyone else to never have to do it again.
>
> Maybe I don't understand how StringRef's work.  I thought they just
> wrapped some foreign storage - a string constant, char * or std::string?
> So for the length computation to be shared for an object handing out
> StringRef's, the object would have to keep both the string and it's
> associated StringRef.  If the functions just RETURN a StringRef that wraps
> a string constant, you'll calculate the length every time.  So IIUC:
>
> static const char *GetSerializationKey() { return "Breakpoint"; }
>
> becomes:
>
> static StringRef GetSerializationKey() { static const StringRef
> contents("Breakpoint"); return contents; }
>
> except now this has a non-trivial constructor, so I should really put a
> std::once around the initializer, right?  That just seems like way more
> trouble than it is worth to keep from computing a length a couple of times.
>
> >
> > On the other hand, using a StringRef gives you many advantages.  Unless
> you know every possible way in which these strings will ever be used, who
> knows what someone might want to do with it?  What if someone wants to take
> one of these strings, check if some other string starts with it, and chop
> it off if so?  You could either write:
> >
> > if (strncmp(GetName(), str.c_str(), strlen(GetName()) == 0)
> >     str2 = str.substr(strlen(GetName()));
> >
> > which there's a good chance will be either wrong or suboptimal, or you
> could write:
> >
> > str.consume_front(GetName());
> >
> > which is both easier to understand, obviously correct, and optimal from
> a performance standpoint.
> >
> > const char* should almost never be used for anything unless you
> absolutely must interface with a C-api for which there is no good
> equivalent on StringRef (very rare).
>
> Hum.  I would say "If you want to start parsing up a string, put it in a
> StringRef, you'll like it..."  But if you are handing out a string
> constant, "const char *" is fine, and the consumers can dress it up if they
> want.
>
> >
> > Since we currently use const char* in many places, this sometimes makies
> interfacing difficult, but as more common classes move to StringRef, this
> will go away and almost all code will become faster and easier to
> understand.
> >
> > You are right that StringRefs don't handle null strings, but they do
> handle empty strings, and it's not common that empty and null need to be
> treated as distinct values.
>
> The classes you reference treat nullptr as "not set".  I could go change
> that too, but I'd rather write some tests.
>
> Jim
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20160913/c823925f/attachment.html>