[PATCH] D135366: [clang][Interp] Implement String- and CharacterLiterals

Tue Oct 11 08:49:29 PDT 2022

tahonermann added inline comments.

================
Comment at: clang/test/AST/Interp/arrays.cpp:143
+
+};
----------------
cor3ntin wrote:
> tbaeder wrote:
> > cor3ntin wrote:
> > > tahonermann wrote:
> > > > As others already noted, additional testing of multicharacter literals and UCNs (including named universal characters like `\N{LATIN_CAPITAL_LETTER_E}` would be beneficial. Some tests of character escapes like `\t` wouldn't hurt either.
> > > > 
> > > > Clang does not yet support use of `-fexec-charset` to set the literal encoding (execution character set) to anything other than UTF-8 though work on that has been done (see D93031). If such work was completed, it would be useful to run some tests against a non-UTF-8 encoding. Maybe next year.
> > > Yes, wide **multicharacter** literals, that's was important information missing, thanks for spotting that.
> > > 
> > > I have mixed feeling about adding tests for escape sequences.  Their replacement doesn't happen during constant evaluation.
> > > We shouldn't replicate the lexing tests here.
> > > 
> > > but we should compare string literal with byte values. Testing a string literal against another one doesn't ensure the code units are correct if both are equally miss evaluated.
> > > 
> > > Also we could add explicit tests for null termination here as they are added as part of evaluation in theory - but then again that's also something clang does earlier.
> > > 
> > > If we want we could consider enabling the byte code interpreter on the existing lexing test files, i actually think that's the better way to deal with the escape sequences tests.
> > I changed the first test that inspects all characters of a string to comparing with integers instead. Do you have a suggestion for what lexing tests to enable the constant interpreter in?
> I think good candidates are
> 
> Lexer/char-escapes.c
> Lexer/char-escapes-delimited.c
> Lexer/char-literal.cpp
Of those, only `Lexer/char-escapes.c` does much validation of literal values. I prefer the approach Timm has already taken relative to those tests.

It looks like we don't have an existing set of Sema tests for character and string literals. How about we move this test under `clang/test/Sema`? That would be the appropriate place to exercise values relative to `-fexec-charset` support for non-UTF-8 encodings in the future. If that sounds amenable, then I'd like the test split to exercise character and string literals separately.

The character literal tests don't really belong in a test named `arrays.cpp` as is.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135366/new/

https://reviews.llvm.org/D135366