[PATCH] D105759: Implement P2361 Unevaluated string literals

Aaron Ballman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 28 12:18:48 PDT 2023


aaron.ballman added a comment.

In D105759#4456864 <https://reviews.llvm.org/D105759#4456864>, @aaron.ballman wrote:

> I don't think it's correct to assume that all string arguments to attributes are unevaluated, but it is hard to tell where to draw the line sometimes. Backing up a step, as I understand P2361 <https://reviews.llvm.org/P2361>, an unevaluated string is one which is not converted into the execution character set (effectively). Is that correct? If so, then as an example, `[[clang::annotate()]]` should almost certainly be using an evaluated string because the argument is passed down to LLVM IR and is used in ways we cannot predict. What's more, an unevaluated string cannot have some kinds of escape characters (numeric and conditional escape sequences) and those are currently allowed by `clang::annotate` and could potentially be used by a backend plugin.
>
> I think other attributes may have similar issues. For example, the `alias` attribute is a bit of a question mark for me -- that takes a string literal representing an external identifier that is looked up. I'm not certain whether that should be in the execution character set or not, but we do support escape sequences for it: https://godbolt.org/z/v65Yd7a68

I took a quick pass over our existing attributes, and here's my intuition on them regarding encoding of the literal:

Unevaluated Strings are Fine:
AbiTag
TLSModel
Availability
Deprecated
EnableIf/DiagnoseIf
ObjCRuntimeName
PragmaClangBSSSection/PragmaClangDataSection/PragmaClangRodataSection/PragmaClangRelroSection/PragmaClangTextSection (only created implicitly)
Suppress
Target/TargetVersion/TargetClones
Unavailable
Uuid
WarnUnusedResult
NoSanitize
Capability
Assumption
NoBuiltin (it names a builtin name, so this is probably fine to leave unevaluated?)
AcquireHandle/UseHandle/ReleaseHandle
Error
HLSLResourceBinding

Unevaluated String are Potentially Bad:
Annotate
AnnotateType

Unevaluated String Needs More Thinking (common thread is that they survive to LLVM IR):
Alias
AsmLabel
IFunc
BTFDeclTag/BTFTypeTag (is emitted to DWARF with -g so probably evaluated?)
WebAssemblyExportName/WebAssemblyImportModule/WebAssemblyImportModule
ExternalSourceSymbol
SwiftAsyncName/SwiftAttr/SwiftBridge/SwiftName
Section/CodeSeg/InitSeg
WeakRef
EnforceTCB/EnforceTCBLeaf

There's also the escape sequences issue where use of an escape sequence will go from accepted to rejected in these contexts. I did some hunting to see if I could find uses of numeric escape sequences in asm labels or alias attributes, to see if there's some signs this is done in practice:

Testing we can find numeric escape sequences at all:
https://sourcegraph.com/search?q=context:global+%5C%28%5C%22%5B%5B:alpha:%5D%5D*%28%5C%5C%5B%5B:digit:%5D%5D%2B%29%2B%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find asm labels at all:
https://sourcegraph.com/search?q=context:global+asm%5C%28%5C%22%5B%5B:alpha:%5D%5D*%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find asm labels with numeric escapes:
https://sourcegraph.com/search?q=context:global+asm%5C%28%5C%22%5B%5B:alpha:%5D%5D*%28%5C%5C%5B%5B:digit:%5D%5D%2B%29%2B%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find alias attributes at all:
https://sourcegraph.com/search?q=context:global+alias%5C%28%5C%22%5B%5B:alpha:%5D%5D*%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find alias attributes with numeric escapes:
https://sourcegraph.com/search?q=context:global+alias%5C%28%5C%22%5B%5B:alpha:%5D%5D*%28%5C%5C%5B%5B:digit:%5D%5D%2B%29%2B%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

I think this leaves me with three open questions:

- Do we know of any uses of the `annotate` attribute that rely on the string literal being in the execution character set? I do not know of any but I know this is used by plugins quite often.
- Do we know of any attributes in the "needs more thinking" list that should have the string literal encoded in the execution character set? I think most of these are for referring to identifiers in source and I expect those would want source character set and not execution character set strings.
- Do we know of any significant body of code using numeric escape sequences in these string literals that could not be relatively easily modified to compile again? I would be surprised, but I think someone should probably run more of the attributes on the "needs more thinking" list through similar searches on source graph and we can use that as an approximation.

If all these answers come back "no" as best we can figure, then I think we can punt on argument-level handling of this until we add an attribute that really does need an execution-encoded (or numeric escape sequence-using) string literal. I think we've got enough time before the Clang 17 branch to hear if the changes cause problems after we've done this due diligence. WDYT?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105759/new/

https://reviews.llvm.org/D105759



More information about the cfe-commits mailing list