<div dir="ltr">Here is the example I was thinking of:<div><br></div><div><a href="https://reviews.llvm.org/D24013">https://reviews.llvm.org/D24013</a><br></div><div><br></div><div>It didn't yet get submitted, so you can see the current code by looking at GDBRemoteCommunicationClient.cpp lines 381-420.  The shortened code is in the above patch.  But here they are for reference:</div><div><br></div><div>Before:</div><div><div>    // Look for a list of compressions in the features list e.g.</div><div>    // qXfer:features:read+;PacketSize=20000;qEcho+;SupportedCompressions=zlib-deflate,lzma</div></div><div>    const char *features_list = ::strstr(response_cstr, "qXfer:features:");<br></div><div><div>    if (features_list) {</div><div>      const char *compressions =</div><div>          ::strstr(features_list, "SupportedCompressions=");</div><div>      if (compressions) {</div><div>        std::vector<std::string> supported_compressions;</div><div>        compressions += sizeof("SupportedCompressions=") - 1;</div><div>        const char *end_of_compressions = strchr(compressions, ';');</div><div>        if (end_of_compressions == NULL) {</div><div>          end_of_compressions = strchr(compressions, '\0');</div><div>        }</div><div>        const char *current_compression = compressions;</div><div>        while (current_compression < end_of_compressions) {</div><div>          const char *next_compression_name = strchr(current_compression, ',');</div><div>          const char *end_of_this_word = next_compression_name;</div><div>          if (next_compression_name == NULL ||</div><div>              end_of_compressions < next_compression_name) {</div><div>            end_of_this_word = end_of_compressions;</div><div>          }</div><div><br></div><div>          if (end_of_this_word) {</div><div>            if (end_of_this_word == current_compression) {</div><div>              current_compression++;</div><div>            } else {</div><div>              std::string this_compression(</div><div>                  current_compression, end_of_this_word - current_compression);</div><div>              supported_compressions.push_back(this_compression);</div><div>              current_compression = end_of_this_word + 1;</div><div>            }</div><div>          } else {</div><div>            supported_compressions.push_back(current_compression);</div><div>            current_compression = end_of_compressions;</div><div>          }</div><div>        }</div><div><br></div><div>        if (supported_compressions.size() > 0) {</div><div>          MaybeEnableCompression(supported_compressions);</div><div>        }</div><div>      }</div><div>    }</div></div><div><br></div><div>After:</div><div><br></div><div><div>        std::vector<llvm::StringRef> supported_compressions;</div><div>        llvm::StringRef this_compression;</div><div>        llvm::StringRef compressions =</div><div>            response_str.split("qXfer:features:").second</div><div>                        .split("SupportedCompressions=").second</div><div>                        .split(";").first;</div><div>        while (!compressions.empty())<br></div><div>        {</div><div>            std::tie(this_compression, compressions) = compressions.split(',');<br></div><div>            supported_compressions.push_back(this_compression);<br></div><div>        }<br></div><div>        if (!supported_compressions.empty())<br></div><div>            MaybeEnableCompression(supported_compressions);<br></div></div><div><br></div><div>Hopefully we can agree the "After" version is easier to understand :)  Not to mention more efficient, since it makes 0 copies, and the original version makes a copy for every compression in the list.</div><div><br></div><div>In any case, I agree there are times where you don't need all this functionality.  I just don't see a reason NOT to use it since it makes better code fall out naturally without even having to try.</div></div><br><div class="gmail_quote"><div dir="ltr">On Mon, Sep 12, 2016 at 7:11 PM Zachary Turner <<a href="mailto:zturner@google.com">zturner@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">The StringRef does in fact store a length.  So if you write this:<div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">std::string S("This is a test");</div><div class="gmail_msg">StringRef R(S);</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Then R internally contains </div><div class="gmail_msg">const char *Ptr = "This is a test"</div><div class="gmail_msg">size_t Len = 14</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">This makes repeated operations on StringRefs really fast (and efficient), because there's no copying involved.  For example, you could say:</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">R.consume_front("This");</div><div class="gmail_msg">R.consume_back("test");</div><div class="gmail_msg">R = R.trim();<br class="gmail_msg"></div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">And you would end up with R = "is a".  In this case, The internal ptr would refer to the original pointer + 5, and the length would now be 4.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">If you have two StringRefs, R and R2, and you say "if (R == R2)", then since both of them store their length, then it's a straight memcmp since both lengths are already known in advance.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">There are lots of useful operations that if you don't have a StringRef, you won't ever think to use, but once you start using them, you'll never want to go back.  That's why I say we should always prefer StringRef unless there is a specific reason not to.  It doesn't prevent you from doing anything that you can already do with a const char*, but it allows you to do much more.  The converse is not true.  If you vend out const char*'s, you will end up with a lot of excessively verbose code, because as soon as someone sees a const char*, they will start looking for the appropriate C method to use to solve their problem, which is probably going to turn into a lot of difficult to read / get correct code, when there's a good chance that there was already a method that did exactly what they wanted in StringRef and would have saved a ton of code.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">One example that I found recently was there was a string formatted as a ; separated list.  Each semicolon was followed by some prefix indicating what followed.  One of those fields was itself a nested comma separated list.  So it was something like this:</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">std::string s = Field1=...;Field2=...;Field3=value1,value2,value3;Field4=...</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">The code to parse this was more than one screen.  Converting it to StringRef and using StringRef::split() it became about 10 lines.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">I agree you don't always "need" it, but you never know how people are going to use it, and I can't think of a reason *not* to use it.  And when you do need it, it makes code drastically simpler.</div></div><br class="gmail_msg"><div class="gmail_quote gmail_msg"><div dir="ltr" class="gmail_msg">On Mon, Sep 12, 2016 at 6:24 PM Jim Ingham <<a href="mailto:jingham@apple.com" class="gmail_msg" target="_blank">jingham@apple.com</a>> wrote:<br class="gmail_msg"></div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I see the whole content, but I'll reply to this one so the reply doesn't get truncated on your end...<br class="gmail_msg">

<br class="gmail_msg">

> On Sep 12, 2016, at 6:03 PM, Zachary Turner <<a href="mailto:zturner@google.com" class="gmail_msg" target="_blank">zturner@google.com</a>> wrote:<br class="gmail_msg">

><br class="gmail_msg">

><br class="gmail_msg">

> Immediately, very little.  A small amount of performance, since comparing StringRefs is faster than comparing null terminated stings, since the length is stored with the string it can use memcmp instead of strcmp.<br class="gmail_msg">

><br class="gmail_msg">

> From a big picture perspective, quite a lot IMO.  StringRef has numerous methods which are extremely useful and not easy to reproduce correctly.  Also, If you do the comparison many times, you only pay for the length computation once.<br class="gmail_msg">

><br class="gmail_msg">

> Also, this email is so long that it's truncating in my email reader.  So I'm not sure if the second part of my email went through.  It's possible your reader is better than mine at handling large emails, but I'm copying the answer to your second question below just in case you also didn't see it.<br class="gmail_msg">

><br class="gmail_msg">

> Thanks for the comments.<br class="gmail_msg">

> ><br class="gmail_msg">

> >   I don't see the benefit of using StringRef's to return all the key names.  I'm generally only ever going to pass them to the StructuredData API's, which makes them into StringRef's transparently on the way in.  How would building StringRefs help here?<br class="gmail_msg">

><br class="gmail_msg">

> >   You also suggested changing a bunch of BreakpointOption API to return StringRef's.  OTOH this CL just mechanically changed from m_options to m_options_up, so changing the API was not part of the patches intent.  OTOH most of these options (condition, thread name, etc) can take and return NULL strings.  I didn't think StringRef's handled null strings well, was I wrong about that?  And again, what would I gain by making these StringRef's?  I'm just going to stash them away in some storage the object owns, and I'm not going to ever share the storage with the caller.  So least common denominator (const char *) seems a better choice here.  If the caller wants a StringRef they are welcome to make one.<br class="gmail_msg">

><br class="gmail_msg">

> Right, but making the StringRef incurs a length computation.  That's not something you want to do over and over.  It's guaranteed someone somewhere is going to compute the length, so it's better to do it once upfront, and allow everyone else to never have to do it again.<br class="gmail_msg">

<br class="gmail_msg">

Maybe I don't understand how StringRef's work.  I thought they just wrapped some foreign storage - a string constant, char * or std::string?  So for the length computation to be shared for an object handing out StringRef's, the object would have to keep both the string and it's associated StringRef.  If the functions just RETURN a StringRef that wraps a string constant, you'll calculate the length every time.  So IIUC:<br class="gmail_msg">

<br class="gmail_msg">

static const char *GetSerializationKey() { return "Breakpoint"; }<br class="gmail_msg">

<br class="gmail_msg">

becomes:<br class="gmail_msg">

<br class="gmail_msg">

static StringRef GetSerializationKey() { static const StringRef contents("Breakpoint"); return contents; }<br class="gmail_msg">

<br class="gmail_msg">

except now this has a non-trivial constructor, so I should really put a std::once around the initializer, right?  That just seems like way more trouble than it is worth to keep from computing a length a couple of times.<br class="gmail_msg">

<br class="gmail_msg">

><br class="gmail_msg">

> On the other hand, using a StringRef gives you many advantages.  Unless you know every possible way in which these strings will ever be used, who knows what someone might want to do with it?  What if someone wants to take one of these strings, check if some other string starts with it, and chop it off if so?  You could either write:<br class="gmail_msg">

><br class="gmail_msg">

> if (strncmp(GetName(), str.c_str(), strlen(GetName()) == 0)<br class="gmail_msg">

>     str2 = str.substr(strlen(GetName()));<br class="gmail_msg">

><br class="gmail_msg">

> which there's a good chance will be either wrong or suboptimal, or you could write:<br class="gmail_msg">

><br class="gmail_msg">

> str.consume_front(GetName());<br class="gmail_msg">

><br class="gmail_msg">

> which is both easier to understand, obviously correct, and optimal from a performance standpoint.<br class="gmail_msg">

><br class="gmail_msg">

> const char* should almost never be used for anything unless you absolutely must interface with a C-api for which there is no good equivalent on StringRef (very rare).<br class="gmail_msg">

<br class="gmail_msg">

Hum.  I would say "If you want to start parsing up a string, put it in a StringRef, you'll like it..."  But if you are handing out a string constant, "const char *" is fine, and the consumers can dress it up if they want.<br class="gmail_msg">

<br class="gmail_msg">

><br class="gmail_msg">

> Since we currently use const char* in many places, this sometimes makies interfacing difficult, but as more common classes move to StringRef, this will go away and almost all code will become faster and easier to understand.<br class="gmail_msg">

><br class="gmail_msg">

> You are right that StringRefs don't handle null strings, but they do handle empty strings, and it's not common that empty and null need to be treated as distinct values.<br class="gmail_msg">

<br class="gmail_msg">

The classes you reference treat nullptr as "not set".  I could go change that too, but I'd rather write some tests.<br class="gmail_msg">

<br class="gmail_msg">

Jim<br class="gmail_msg">

<br class="gmail_msg">

<br class="gmail_msg">

</blockquote></div></blockquote></div>