[PATCH] D105759: [WIP] Implement P2361 Unevaluated string literals

Corentin Jabot via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Aug 25 15:46:58 PDT 2021


cor3ntin added inline comments.


================
Comment at: clang/lib/Parse/ParseDecl.cpp:422-423
 
-        ExprResult ArgExpr(
-            Actions.CorrectDelayedTyposInExpr(ParseAssignmentExpression()));
+        ExprResult ArgExpr(Actions.CorrectDelayedTyposInExpr(
+            ParseAttributeArgAsUnevaluatedLiteralOrExpression(AttrKind)));
         if (ArgExpr.isInvalid()) {
----------------
aaron.ballman wrote:
> cor3ntin wrote:
> > aaron.ballman wrote:
> > > Hmmm, I'm not certain about these changes.
> > > 
> > > For some attributes, the standard currently requires accepting any kind of string literal (like `[[deprecated]]` https://eel.is/c++draft/dcl.attr.deprecated#1). P2361 is proposing to change that, but it's not yet accepted by WG21 (let alone WG14). So giving errors in those cases is a bit of a hard sell -- I think warnings would be a bit more reasonable.
> > > 
> > > But for other attributes (like `annotate`), it's a bit less clear whether we should *prevent* literal prefixes because the attribute can be used to have runtime impacts (for example, I can imagine someone using `annotate` to emit the string literal bytes into the resulting binary). In some cases, I think it's very reasonable (e.g., `diagnose_if` should behave the same as `deprecated` and `nodiscard` because those are purely about generating diagnostics at compile time).
> > > 
> > > I kind of wonder whether we're going to want to tablegenerate whether the argument needs to be parsed as unevaluated or not on an attribute-by-attribute basis.
> > Yep, I would not expect this to get merge before P2361 but I think the implementation experience is useful and raised a bunch of good questions.
> > I don't think it ever makes sense to have `L` outside of literals - but people *might* do it currently, in which case there is a concern about whether it breaks code (I have found no evidence of that though).
> > 
> > If we wanted to inject these strings in the binary - in some form, then we might have to transcode them at that point.
> > I don't think the user would know if the string would be injected as wide or narrow (or something else) by the compiler.
> > 
> > `L` is really is want to convert that string _at that point_. in an attribute, strings might have multiple usages so it's better to delay any transcoding.
> > Does that make sense?
> > 
> > But I agree that a survey of what each attribute expects is in order.
> > 
> > 
> > 
> > Yep, I would not expect this to get merge before P2361 but I think the implementation experience is useful and raised a bunch of good questions.
> 
> Absolutely agreed, this is worthwhile effort!
> 
> > If we wanted to inject these strings in the binary - in some form, then we might have to transcode them at that point.
> > I don't think the user would know if the string would be injected as wide or narrow (or something else) by the compiler.
> 
> My intuition is that a user who writes `L"foo"` will expect a wide `"foo"` to appear in the binary in the cases where the string ends up making it that far.
> 
> > L is really is want to convert that string _at that point_. in an attribute, strings might have multiple usages so it's better to delay any transcoding.
> > Does that make sense?
> 
> Not yet, but I might get there eventually. :-D My concern is that vendor attributes can basically do anything, so there's no reason to assume that any given string literal usage should or should not transcode. I think we have to decide on a case by case basis by letting the attributes specify what they intend in their argument lists. However, my intuition is that *most* attributes will expect unevaluated string literals because the string argument doesn't get passed to LLVM.
The status quo is that everything transcodes.

But not transcoding, we do not destroy any information as to what is in the source.

If an attribute then wants to use the string later in such a way that it needs to transcode to a literal encoding (or something else, for example, one might imagine a fun scenario where literal are ASCII encoded and debug information are EBCDIC encoded), then that can be done, because the string still exists.

Whereas for literal specifically, we assume they will be evaluated by the abstract machine as per phase 5 so we transcode them immediately. which is destructive. we get away with it because the original spelling is in the source if we need it, and currently, literals are also assumed to be (potentially invalid because of `\x` escape sequences) UTF-8.

There is an alternative design where string literals are not transcoded until lazily evaluated but I'm not sure there is a big motivation for that.

So this PR is exactly trying not to force a specific behavior on attributes that I assume can be displayed, put into some form in the binary, or converted to literal which might represent 3 distinct encodings. The parser leaving them as Unicode is the least opinionated thing the parser can possibly do.
And then each attribute can decide for itself if it needs to transcode, and how to handle any errors if they occur.
An attribute might decide to keep both a Unicode and non-Unicode spelling around if the string has a dual purpose, etc


Question though, Is there a scenario in which `\x`/`\0` would actually be useful in the context of attributes? Because if so, then we might need to do something to allow that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105759/new/

https://reviews.llvm.org/D105759



More information about the cfe-commits mailing list