[PATCH] D135366: [clang][Interp] Implement String- and CharacterLiterals

Wed Oct 12 00:10:29 PDT 2022

tbaeder added inline comments.

================
Comment at: clang/test/AST/Interp/arrays.cpp:143
+
+};
----------------
cor3ntin wrote:
> tahonermann wrote:
> > cor3ntin wrote:
> > > tbaeder wrote:
> > > > cor3ntin wrote:
> > > > > tahonermann wrote:
> > > > > > As others already noted, additional testing of multicharacter literals and UCNs (including named universal characters like `\N{LATIN_CAPITAL_LETTER_E}` would be beneficial. Some tests of character escapes like `\t` wouldn't hurt either.
> > > > > > 
> > > > > > Clang does not yet support use of `-fexec-charset` to set the literal encoding (execution character set) to anything other than UTF-8 though work on that has been done (see D93031). If such work was completed, it would be useful to run some tests against a non-UTF-8 encoding. Maybe next year.
> > > > > Yes, wide **multicharacter** literals, that's was important information missing, thanks for spotting that.
> > > > > 
> > > > > I have mixed feeling about adding tests for escape sequences.  Their replacement doesn't happen during constant evaluation.
> > > > > We shouldn't replicate the lexing tests here.
> > > > > 
> > > > > but we should compare string literal with byte values. Testing a string literal against another one doesn't ensure the code units are correct if both are equally miss evaluated.
> > > > > 
> > > > > Also we could add explicit tests for null termination here as they are added as part of evaluation in theory - but then again that's also something clang does earlier.
> > > > > 
> > > > > If we want we could consider enabling the byte code interpreter on the existing lexing test files, i actually think that's the better way to deal with the escape sequences tests.
> > > > I changed the first test that inspects all characters of a string to comparing with integers instead. Do you have a suggestion for what lexing tests to enable the constant interpreter in?
> > > I think good candidates are
> > > 
> > > Lexer/char-escapes.c
> > > Lexer/char-escapes-delimited.c
> > > Lexer/char-literal.cpp
> > Of those, only `Lexer/char-escapes.c` does much validation of literal values. I prefer the approach Timm has already taken relative to those tests.
> > 
> > It looks like we don't have an existing set of Sema tests for character and string literals. How about we move this test under `clang/test/Sema`? That would be the appropriate place to exercise values relative to `-fexec-charset` support for non-UTF-8 encodings in the future. If that sounds amenable, then I'd like the test split to exercise character and string literals separately.
> > 
> > The character literal tests don't really belong in a test named `arrays.cpp` as is.
> > Of those, only Lexer/char-escapes.c does much validation of literal values. I prefer the approach Timm has already taken relative to those tests.
> 
> We can do both, I was not arguing against the test we have here, I'm happy with those :)
> I'm opposed to duplicate tests for escape sequences here.  using the new interpreter on tests that already exist (in addition of the existing run commands) is pretty easy and cheap to do.
> 
> I don't have opinions how the new interpreter tests are organized.
> Ideally we would have a complete set of test that is equally suitable for both constexpr engines, but maybe that's something that does not need to be done as part of this PR 
I've added a new run line to `test/Lexer/char-escapes.c`, which works fine. To summarize, the plan is to add a new test to `clang/test/Sema`? Or split the char tests out from `arrays.cpp` and add some for multicharacter char sequences?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135366/new/

https://reviews.llvm.org/D135366