[cfe-dev] Patch for review: enhancement to __builtin_strlen
Richard Smith
richard at metafoo.co.uk
Thu Jun 21 15:13:18 PDT 2012
Hi,
On Thu, Jun 21, 2012 at 9:58 AM, Andy Gibbs <andyg1001 at hotmail.co.uk> wrote:
> I would like to offer a patch for review that extends the use of the
> compile-time evaluation of __builtin_strlen to include immutable
> null-terminated const char arrays, i.e. beyond just standard string
> literals. (It will still fall back to runtime use of library strlen, if
> compile-time evaluation is not possible/required -- this side of the
> functionality is not changed).
>
> This is not a bug fix but a feature enhancement, which I believe may have
> broader appeal to those developing with C++11 and, in particular, with
> user-defined literals.
This seems like a completely reasonable enhancement. As we inevitably
shift towards making more library functions usable in constant
expressions, this seems very much like a feature we will want.
> The patch is intended to be very conservative in terms of what it will
> accept, and so the extension only allows constant sized arrays of const char
> (__builtin_strlen limits this to char, although wchar_t, etc. would work
> otherwise) and where the compiler can evaluate the expression as a constant
> array (i.e. not a pointer or an lvalue of an array that can be modified at
> runtime, for example). I believe there are adequate checks to stop
> incorrect compilation, for example, incorrectly doing a compile-time
> evaluation of __builtin_strlen on a variable which is not immutable, but
> please can this be particularly verified by someone.
I think the patch is actually too conservative. For instance, I would
like for the following to work:
constexpr const char *p = "foobar";
constexpr int n = __builtin_strlen(p);
More generally, the builtin should behave as if it were defined as:
constexpr size_t __builtin_strlen(const char *p) { return *p ? 1 +
__builtin_strlen(p+1) : 0; }
... except that it should be faster (and not limited by the constexpr
recursion depth limit).
To this end, I suggest you always evaluate the pointer argument,
rather than only doing so if it has a constant array type. Use
EvaluatePointer rather than Expr::EvaluateAsRValue in order to pick up
values of constexpr function parameters and the like. Finally, to
extract characters from an LValue, you can repeatedly call
HandleLValueToRValueConversion and LValue::adjustIndex until you hit a
NUL. HandleLValueToRValueConversion will return false if you leave the
string, so you don't need to explicitly check for that case.
For efficiency, you should check whether the LValue's base is a
StringLiteralExpr, and if so, use a fast-path (be sure to take account
of the offset, in case the pointer doesn't point to the start of the
literal).
The above is still inefficient in the not-a-string-literal case, due
to the repeated work in HandleLValueToRValueConversion. To avoid that,
you could instead (after evaluating the pointer):
* Remove the last element from the LValue, if there is one (if not,
use the method above above -- this might be "__builtin_strlen(&c)",
where we might have "const char c = 0;").
* Use findMostDerivedSubobject to find the resulting type. If it's
not an array, use the method above -- this might be
"__builtin_strlen(&x.c)", where we might have "struct X { char c = 0;
} constexpr x;".
* Use HandleLValueToRValueConversion on the resulting LValue to get
an APValue. For efficiency, we don't build APValue::Array objects from
string literals, so you will either get an APValue::LValue
representation or you'll get an APValue::LValue which refers to a
string literal expression.
* Use the approach in your patch to deal with the array or string
literal, starting from the array index you removed in the first step.
More information about the cfe-dev
mailing list