[PATCH] D25255: Add a c_str() method to StringRef, similar to data() but asserting if the string isn't null terminated (NFC)

Zachary Turner via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 4 18:11:31 PDT 2016


One possible way to make this actually be *safe* is to privately inherit
ZeroStringRef from StringRef and bring in all the base class functions via
using declarations. EXCEPT those base class functions that CANNOT un null
terminate a null terminated string (eg drop_front() etc). For those
functions, reimplement them to return a ZeroStringRef.

Allow StringRef to be implicitly constructed from ZSR but not the other way
around

Anyway, I don't feel qualified to lgtm a change like this so +chandlerc who
is the owner of Support


On Tue, Oct 4, 2016 at 6:01 PM Duncan P. N. Exon Smith <dexonsmith at apple.com>
wrote:

> I disagree, although not strongly.
>
> Using data() and size() together is the expected usage model for
> StringRef.  This matches many array-wrappers, and it's common practice to
> have data() be a raw pointer with size() indicating how large it is (see
> std::vector, SmallVector, ArrayRef, std::string, maybe others?).
>
> The common practice for cstr() is that it (a) is always valid and (b)
> returns a valid (and the correct) null-terminated string.  StringRef::cstr
> has special semantics and different guarantees than the other cstr() I'm
> aware of.  As a newcomer to LLVM, if I saw a StringRef::cstr, I would
> blindly assume it was valid and use it.  The bugs would be fairly
> surprising.
>
> I'm also influenced by the point (I can't find it now) that ASan and the
> assert will not catch all the bugs with it, since the StringRef-ized char
> array could be in a struct (or a stream) that just happens to have a 0 next
> to it.
>
> Maybe there's a more neutral name than StringRef::unsafe_cstr (which
> sounds like it should never be called) or StringRef::cstr (which sounds
> like it magically does "the right thing").
>
> --
>
> Also, I just noticed Zachary is suggesting making this a local static
> function instead of StringRef API.  I don't really have an opinion on that
> part... depends whether we'd (eventually) want to use this in more than one
> translation unit.  If so, it should be StringRef API.
>
> > On 2016-Oct-04, at 16:57, Mehdi Amini <mehdi.amini at apple.com> wrote:
> >
> > Yet data() isn’t called “unsafe_data()” and is more unsafe than this one.
> >
> >> On Oct 4, 2016, at 4:53 PM, Zachary Turner <zturner at google.com> wrote:
> >>
> >> I'd be more comfortable if the name was like unsafe_cstr() or
> something. Thoughts?
> >> On Tue, Oct 4, 2016 at 4:51 PM Mehdi Amini <mehdi.amini at apple.com>
> wrote:
> >>> On Oct 4, 2016, at 4:46 PM, Zachary Turner <zturner at google.com> wrote:
> >>>
> >>> I mean "how many callsites are there”?
> >>
> >> I haven’t audited / counted, and I’m not done with all the conversions
> yet.
> >>
> >>
> >>>
> >>> On Tue, Oct 4, 2016 at 4:23 PM Mehdi Amini <mehdi.amini at apple.com>
> wrote:
> >>>> On Oct 4, 2016, at 4:17 PM, Zachary Turner <zturner at google.com>
> wrote:
> >>>>
> >>>> How frequently would it need to be done at each callsite?
> >>>
> >>> I don’t understand the question. Basically “always”?
> >>>
> >>>
> >>>> My concern is just that it sounds like we're adding a potentially
> dangerous function to a very generic and widely used API for the purposes
> of one client.
> >>>
> >>> Well the alternative to this API is .data(), the goal is to add a
> safer API than .data() :)
> >>>
> >>> I'd inspect closely any client that is using .data() instead of this
> one.
> >>>
> >>>
> >>>
> >>>>   If you make the function available, people are going to use it
> without understanding the implications.  Another possibility is that if
> this all takes place within a single common translation unit, you could
> write a private function inside there that looks like this:
> >>>>
> >>>> const char *Unsafe_cstr(StringRef S) {
> >>>>   assert(strlen(S.data()) == S.size());
> >>>>   return S.data();
> >>>> }
> >>>>
> >>>> That way the unsafe function is not exposed to general clients of
> StringRef?
> >>>>
> >>>> On Tue, Oct 4, 2016 at 4:10 PM Mehdi Amini <mehdi.amini at apple.com>
> wrote:
> >>>> The use case is that MachineInstruction have operands, but the size
> of their operands is limited. Right now you have the space for one-pointer,
> but not for a StringRef. So if you create a MachineInstruction that takes a
> symbol name as an operand, we can only store the “const chat *” from the
> StringRef.
> >>>>
> >>>> So what you’re suggesting to do is roughly what I’m trying to achieve
> here as well, but instead of replicating the logic at each call sites, I
> try to abstract it behind the API.
> >>>>
> >>>>
> >>>>> On Oct 4, 2016, at 3:56 PM, Zachary Turner <zturner at google.com>
> wrote:
> >>>>>
> >>>>> I'm not familiar with those cases, but could you do something like
> this:
> >>>>>
> >>>>> assert(strlen(s.data()) == s.size() && "Str is not null
> terminated!");
> >>>>>
> >>>>> at each call site?
> >>>>>
> >>>>> On Tue, Oct 4, 2016 at 3:48 PM Mehdi Amini <mehdi.amini at apple.com>
> wrote:
> >>>>> Neat, I didn’t know this one!
> >>>>>
> >>>>> That does not solve cases like the round-trip StringRef -> MO ->
> StringRef though.
> >>>>>
> >>>>>
> >>>>>> On Oct 4, 2016, at 3:38 PM, Zachary Turner <zturner at google.com>
> wrote:
> >>>>>>
> >>>>>> Can you use %*s format specifier in those cases?
> >>>>>>
> >>>>>> printf("%*s", s.size(), s.data());
> >>>>>> On Tue, Oct 4, 2016 at 3:34 PM Mehdi AMINI <mehdi.amini at apple.com>
> wrote:
> >>>>>> mehdi_amini added a comment.
> >>>>>>
> >>>>>> In https://reviews.llvm.org/D25255#561566, @zturner wrote:
> >>>>>>
> >>>>>> > Not sure how I feel about this.  It's convenient, but it has
> potential for abuse.  Where did you run into issues porting code to
> `StringRef` that this solves?
> >>>>>>
> >>>>>>
> >>>>>> I ran into this with "printf" like API at some point, or with
> places that can't be converted to StringRef because of space constraint,
> like MachineOperand. So you're out of the IR and you have to use .data() to
> initialize the MO. Later you may construct a StringRef from this "const
> char *" again.
> >>>>>>
> >>>>>> >   I've done a lot of `StringRef` porting in LLDB by now, and I've
> always managed to find a solution to this.  Usually it involves trickling
> the `StringRef` changes down further
> >>>>>>
> >>>>>> Usually that's what I do, yes. The problem is when you reach some
> cases like above.
> >>>>>>
> >>>>>>
> >>>>>> https://reviews.llvm.org/D25255
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161005/2f58105f/attachment.html>


More information about the llvm-commits mailing list