[cfe-commits] cfe-commits Digest, Vol 46, Issue 200

Matthieu Monrocq matthieu.monrocq at gmail.com
Sat Apr 30 03:55:48 PDT 2011


From: Matthieu Monrocq <matthieu.monrocq at gmail.com>
> Subject: [cfe-commits] Undefined Behavior when invoking StringRef
>        constructor on a null pointer
> To: cfe-commits at cs.uiuc.edu
> Message-ID: <BANLkTimhEP5e+Rw3RsDZNYDqRncqDFotcw at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> The  llvm::StringRef  class has a constructor taking a  const char*
> parameter.
>
> This constructor is extremely simple, and I am afraid too simple. It
> directly invokes  ::strlen  on the parameter, without checking whether or
> not the pointer is null or not.
>
> Unfortunately as many C functions, strlen is not required by the standard
> to
> check its input, and indeed popular implementations assume that the input
> is
> not null, which results in undefined behavior.
>
> As far as I see it, there are two ways to deal with this:
> - using an assert, to check that the input is non-null. It does not
> slow-down the program built with asserts disabled, but does not allow us to
> invoke StringRef on null pointers
> - using a simple inlined test (ternary operator ?:) to either invoke strlen
> or set the length to 0. Makes migrating from  const char*  to
> llvm::StringRef  easier.
>
> I've used the second approach in the patch enclosed (which gmail thoroughly
> refused to. I have not measured the performance impact though.
>
> Please review.
> Matthieu.
>

Hum, looking further it seems to be only the top of the iceberg.

1. The default constructor makes  Data  null, so having another constructor
doing so does not violate any invariant
2. However I am afraid that not all functions guard agaisnt null properly

Even though  llvm::StringRef::str  has a special case for  Data  being null,
many others (comparisons at least) use  memcmp  without this validation,
and  memcmp  requires its arguments to be non-null (and we hit that bug on
SLES 10 with memcpy recently at work, even with a null length the arguments
should be non-null, otherwise undefined behavior is invoked).

Normally the C Standard precise that all functions taking pointers should be
provided with non-null pointers, unless otherwise precised in their specific
description.

You can see the reference for memcpy on StackOverflow:
http://stackoverflow.com/questions/5243012/is-it-guaranteed-to-be-safe-to-perform-memcpy0-0-0/5243068#5243068

And this mail thread (I should really find a copy of the C final draft...)
suggests it also applies to memcmp:
http://www.mail-archive.com/bug-gnulib@gnu.org/msg10149.html


It seems to me that there are two roads:
- Removing this possibily of being null: simple enough, we just check it at
construction and substitute an empty c-string instead
- Allowing it, but catching all the bugs associated

I am not confident in my ability to catch all uses of  Data  right now and
to remember this in the future when maintaining the class.

The first road thus seems saner and I have implemented it in a second
iteration of the patch.


However it seems that a number of utilities (among which
llvm/lib/Support/CommandLine.cpp) differentiate between an empty c-string
and a null pointer.

I don't mind trying to fix all those who relied on the fact that  Data
could be null, but there is also a chance that I'll miss some uses.

Thoughts ?

Matthieu.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110430/ce57179f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm_stringref_ub.diff
Type: application/octet-stream
Size: 1457 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110430/ce57179f/attachment.obj>


More information about the cfe-commits mailing list