[llvm-commits] PATCH: Adding an array constructor to StringRef

Chandler Carruth chandlerc at google.com
Mon Jun 18 09:54:12 PDT 2012


On Mon, Jun 18, 2012 at 9:48 AM, Marshall Clow <mclow.lists at gmail.com>wrote:

> On Jun 18, 2012, at 8:47 AM, David Blaikie wrote:
> > On Mon, Jun 18, 2012 at 8:36 AM, Marshall Clow <mclow.lists at gmail.com>
> wrote:
> >> Modeled on the one in ArrayRef
> >>
> >>    /// Construct an string ref from a C array.
> >>    template <size_t N>
> >>    /*implicit*/ StringRef(const char (&Arr)[N])
> >>      : Data(Arr), Length(N) {}
> >>
> >
> > One possible problem with this is that if you construct a StringRef
> > from a string literal, you'll end up including the '\0' inside it.
>
> Ok - I can see that.
> >
> > (also, when I tried this myself a few months ago, there's at least one
> > case where a StringRef is constructed from a large fixed-size buffer
> > on the stack - when using this ctor, it obviously gets the size quite
> > wrong)
>
> Interesting. Do you remember where that was?
>
> > That being said, I like the idea, I'm just not sure the right way to
> > implement it - when I was experimenting with this I implemented it
> > with a '\0' check for the last character (truncating the size by one
> > when present, and leaving it at the full size when it wasn't present)
>
> Ok.
>
> > - and added an assert for strlen(Data) == Length to catch the buffer
> > case (which means the ctor didn't provide performance improvements in
> > debug builds but only in release builds).
>
> I see three problems here:
> 1) Constructing from string literal gives one too large. This is easily
> fixed.
> 2) Your case above.
> 3) The general case, where you have an array that may contain a NULL
> somewhere in the middle.
>
> (These last two might turn out to be the same case.)
>
> The second case, I don't see as a problem, given that the ( data, len )
> constructor does not enforce the invariant that StringRef::Length () ==
> strlen ( StringRef.data()).
> I know that StringRef.data() doesn't have to be NULL-terminated, but I
> don't see any care being taken to deal with embedded NULL characters,
> either.
>
> Maybe the confusion here is due to the fact that this would create a
> second one-argument constructor, and it would not be clear which was being
> chosen:
>
>        StringRef ( array )     - gives a reference to the entire array
>        StringRef ( pointer )  - gives a reference to the (null-terminated)
> string
>
> Ok - I'm not sure I see a good way to disambiguate those (or rather, for
> the user to know which is being called, given that arrays decay to pointers
> if you look at them funny)
>
> Dang.  Any ideas?


Folks, why does this matter?

So, the concern about 'strlen' performance is, I suspect, a red herring.
Clang/LLVM is very good at removing 'strlen' calls to string literals. I
would also expect it to  be very good at inlining StringRef("foo"). In
fact, if it fails to inline that,  *ever*, it's  a flat out bug, file it
 against  me.

Once the inline happens, the performance problem vanishes. So maybe we can
just use strlen and call it a day?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120618/5734f7c5/attachment.html>


More information about the llvm-commits mailing list