[PATCH] D25255: Add a c_str() method to StringRef, similar to data() but asserting if the string isn't null terminated (NFC)

Mehdi Amini via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 4 16:35:49 PDT 2016


So, that’s why we have ASAN (note how the conditional compilation addresses this). Also, this is not enabled in production builds (we don’t do this “out-of-bound” access in non-release mode). And finally, consider the alternative: clients today may just round-trip to “const char *” using .data(), and strlen() will be implicitly called, leading to the problem you’re describing.

A client that calls this API *expects* to have a null terminated string. If this is dangerous to do that, it is an issue with the client. I don’t see any other way to deal with that. 

Ideally, I'd rather have a “safe” StringRef with the guarantee that it is *always* null terminated. Right now StringRef is more like an ArrayRef and I find this quite unfortunate.


> On Oct 4, 2016, at 4:31 PM, Craig Topper <craig.topper at gmail.com> wrote:
> 
> What if someone accidentally calls on something that isn't intended to be null terminated but the next object in memory happens to be 0? Looking outside of the defined bounds of the StringRef seems dangerous.
> 
> ~Craig
> 
> On Tue, Oct 4, 2016 at 4:23 PM, Mehdi Amini via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
> 
>> On Oct 4, 2016, at 4:17 PM, Zachary Turner <zturner at google.com <mailto:zturner at google.com>> wrote:
>> 
>> How frequently would it need to be done at each callsite? 
> 
> I don’t understand the question. Basically “always”?
> 
>> My concern is just that it sounds like we're adding a potentially dangerous function to a very generic and widely used API for the purposes of one client.
> 
> Well the alternative to this API is .data(), the goal is to add a safer API than .data() :)
> 
> I'd inspect closely any client that is using .data() instead of this one.
> 
> 
> 
>>   If you make the function available, people are going to use it without understanding the implications.  Another possibility is that if this all takes place within a single common translation unit, you could write a private function inside there that looks like this:
>> 
>> const char *Unsafe_cstr(StringRef S) {
>>   assert(strlen(S.data()) == S.size());
>>   return S.data();
>> }
>> 
>> That way the unsafe function is not exposed to general clients of StringRef?
>> 
>> On Tue, Oct 4, 2016 at 4:10 PM Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>> The use case is that MachineInstruction have operands, but the size of their operands is limited. Right now you have the space for one-pointer, but not for a StringRef. So if you create a MachineInstruction that takes a symbol name as an operand, we can only store the “const chat *” from the StringRef.
>> 
>> So what you’re suggesting to do is roughly what I’m trying to achieve here as well, but instead of replicating the logic at each call sites, I try to abstract it behind the API.
>> 
>> 
>>> On Oct 4, 2016, at 3:56 PM, Zachary Turner <zturner at google.com <mailto:zturner at google.com>> wrote:
>>> 
>>> I'm not familiar with those cases, but could you do something like this:
>>> 
>>> assert(strlen(s.data()) == s.size() && "Str is not null terminated!");
>>> 
>>> at each call site?
>>> 
>>> On Tue, Oct 4, 2016 at 3:48 PM Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>>> Neat, I didn’t know this one!
>>> 
>>> That does not solve cases like the round-trip StringRef -> MO -> StringRef though.
>>> 
>>> 
>>>> On Oct 4, 2016, at 3:38 PM, Zachary Turner <zturner at google.com <mailto:zturner at google.com>> wrote:
>>>> 
>>>> Can you use %*s format specifier in those cases?
>>>> 
>>>> printf("%*s", s.size(), s.data());
>>>> On Tue, Oct 4, 2016 at 3:34 PM Mehdi AMINI <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>>>> mehdi_amini added a comment.
>>>> 
>>>> In https://reviews.llvm.org/D25255#561566 <https://reviews.llvm.org/D25255#561566>, @zturner wrote:
>>>> 
>>>> > Not sure how I feel about this.  It's convenient, but it has potential for abuse.  Where did you run into issues porting code to `StringRef` that this solves?
>>>> 
>>>> 
>>>> I ran into this with "printf" like API at some point, or with places that can't be converted to StringRef because of space constraint, like MachineOperand. So you're out of the IR and you have to use .data() to initialize the MO. Later you may construct a StringRef from this "const char *" again.
>>>> 
>>>> >   I've done a lot of `StringRef` porting in LLDB by now, and I've always managed to find a solution to this.  Usually it involves trickling the `StringRef` changes down further
>>>> 
>>>> Usually that's what I do, yes. The problem is when you reach some cases like above.
>>>> 
>>>> 
>>>> https://reviews.llvm.org/D25255 <https://reviews.llvm.org/D25255>
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161004/239ac0b7/attachment.html>


More information about the llvm-commits mailing list