<div dir="ltr">Looks like llvm's regex is better than LLDB's in this regard, since it supports explicitly setting the end pointer. I can see a couple options:<div><br></div><div>1) Check if it's null terminated by peeking one past the end, and copying if it's not. This is pretty hackish, not crazy about this idea.</div><div>2) Un-delete the const char * version of the function but leave the StringRef overload, find all places where I added the explicit conversion and remove them so they invoke the const char* overload.</div><div>3) Change lldb::RegularExpression to just delegate to llvm under the hood and set the end pointer.</div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner <<a href="mailto:zturner@google.com">zturner@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">Actually wait, it doesn't. It just explicitly sets the end pointer.</div><br class="gmail_msg"><div class="gmail_quote gmail_msg"><div dir="ltr" class="gmail_msg">On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner <<a href="mailto:zturner@google.com" class="gmail_msg" target="_blank">zturner@google.com</a>> wrote:<br class="gmail_msg"></div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">Worth noting that llvm::Regex has this constructor:<div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Regex::Regex(StringRef regex, unsigned Flags) {</div><div class="gmail_msg"> unsigned flags = 0;</div><div class="gmail_msg"> preg = new llvm_regex();</div><div class="gmail_msg"> preg->re_endp = regex.end();</div><div class="gmail_msg"> if (Flags & IgnoreCase) </div><div class="gmail_msg"> flags |= REG_ICASE;</div><div class="gmail_msg"> if (Flags & Newline)</div><div class="gmail_msg"> flags |= REG_NEWLINE;</div><div class="gmail_msg"> if (!(Flags & BasicRegex))</div><div class="gmail_msg"> flags |= REG_EXTENDED;</div><div class="gmail_msg"> error = llvm_regcomp(preg, regex.data(), flags|REG_PEND);</div><div class="gmail_msg">}</div></div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">So it assumes null termination even though you have a StringRef.</div></div><br class="gmail_msg"><div class="gmail_quote gmail_msg"><div dir="ltr" class="gmail_msg">On Wed, Sep 21, 2016 at 4:43 PM Zachary Turner <<a href="mailto:zturner@google.com" class="gmail_msg" target="_blank">zturner@google.com</a>> wrote:<br class="gmail_msg"></div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">You need to duplicate something on the heap once when you execute the regex. And in turn you save tens or hundreds or copies on the way there because of inefficient string usage. <div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">We could also just un-delete the overload that takes a const char*, then the duplication would only ever happen when you explicitly use a StringRef.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">I don't agree this should be reverted. In the process of doing this conversion I eliminated numerous careless string copies.<br class="gmail_msg"></div></div><br class="gmail_msg"><div class="gmail_quote gmail_msg"><div dir="ltr" class="gmail_msg">On Wed, Sep 21, 2016 at 4:38 PM Greg Clayton <<a href="mailto:gclayton@apple.com" class="gmail_msg" target="_blank">gclayton@apple.com</a>> wrote:<br class="gmail_msg"></div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This it the perfect example of why not to use a StringRef since the string needs to be null terminated. Why did we change this? Now even if you call this function:<br class="gmail_msg">
<br class="gmail_msg">
RegularExpression r(...);<br class="gmail_msg">
<br class="gmail_msg">
r.Execute(".......................", ...)<br class="gmail_msg">
<br class="gmail_msg">
You will need to duplicate the string on the heap just to execute this. Please revert this. Anything that requires null terminate is not a candidate for converting to StringRef.<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
> On Sep 21, 2016, at 10:13 AM, Zachary Turner via lldb-commits <<a href="mailto:lldb-commits@lists.llvm.org" class="gmail_msg" target="_blank">lldb-commits@lists.llvm.org</a>> wrote:<br class="gmail_msg">
><br class="gmail_msg">
> Author: zturner<br class="gmail_msg">
> Date: Wed Sep 21 12:13:51 2016<br class="gmail_msg">
> New Revision: 282090<br class="gmail_msg">
><br class="gmail_msg">
> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=282090&view=rev" rel="noreferrer" class="gmail_msg" target="_blank">http://llvm.org/viewvc/llvm-project?rev=282090&view=rev</a><br class="gmail_msg">
> Log:<br class="gmail_msg">
> Fix failing regex tests.<br class="gmail_msg">
><br class="gmail_msg">
> r282079 converted the regular expression interface to accept<br class="gmail_msg">
> and return StringRefs instead of char pointers. In one case<br class="gmail_msg">
> a null pointer check was converted to an empty string check,<br class="gmail_msg">
> but this was an incorrect conversion because an empty string<br class="gmail_msg">
> is a valid regular expression. Removing this check should<br class="gmail_msg">
> fix the test failures.<br class="gmail_msg">
><br class="gmail_msg">
> Modified:<br class="gmail_msg">
> lldb/trunk/source/Core/RegularExpression.cpp<br class="gmail_msg">
><br class="gmail_msg">
> Modified: lldb/trunk/source/Core/RegularExpression.cpp<br class="gmail_msg">
> URL: <a href="http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff" rel="noreferrer" class="gmail_msg" target="_blank">http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff</a><br class="gmail_msg">
> ==============================================================================<br class="gmail_msg">
> --- lldb/trunk/source/Core/RegularExpression.cpp (original)<br class="gmail_msg">
> +++ lldb/trunk/source/Core/RegularExpression.cpp Wed Sep 21 12:13:51 2016<br class="gmail_msg">
> @@ -102,7 +102,7 @@ bool RegularExpression::Compile(llvm::St<br class="gmail_msg">
> //---------------------------------------------------------------------<br class="gmail_msg">
> bool RegularExpression::Execute(llvm::StringRef str, Match *match) const {<br class="gmail_msg">
> int err = 1;<br class="gmail_msg">
> - if (!str.empty() && m_comp_err == 0) {<br class="gmail_msg">
> + if (m_comp_err == 0) {<br class="gmail_msg">
> // Argument to regexec must be null-terminated.<br class="gmail_msg">
> std::string reg_str = str;<br class="gmail_msg">
> if (match) {<br class="gmail_msg">
><br class="gmail_msg">
><br class="gmail_msg">
> _______________________________________________<br class="gmail_msg">
> lldb-commits mailing list<br class="gmail_msg">
> <a href="mailto:lldb-commits@lists.llvm.org" class="gmail_msg" target="_blank">lldb-commits@lists.llvm.org</a><br class="gmail_msg">
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits" rel="noreferrer" class="gmail_msg" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits</a><br class="gmail_msg">
<br class="gmail_msg">
</blockquote></div></blockquote></div></blockquote></div></blockquote></div>