[PATCH] D135366: [clang][Interp] Implement String- and CharacterLiterals

Wed Oct 12 11:48:32 PDT 2022

aaron.ballman added inline comments.

================
Comment at: clang/test/AST/Interp/arrays.cpp:143
+
+};
----------------
tahonermann wrote:
> tbaeder wrote:
> > cor3ntin wrote:
> > > tahonermann wrote:
> > > > cor3ntin wrote:
> > > > > tbaeder wrote:
> > > > > > cor3ntin wrote:
> > > > > > > tahonermann wrote:
> > > > > > > > As others already noted, additional testing of multicharacter literals and UCNs (including named universal characters like `\N{LATIN_CAPITAL_LETTER_E}` would be beneficial. Some tests of character escapes like `\t` wouldn't hurt either.
> > > > > > > > 
> > > > > > > > Clang does not yet support use of `-fexec-charset` to set the literal encoding (execution character set) to anything other than UTF-8 though work on that has been done (see D93031). If such work was completed, it would be useful to run some tests against a non-UTF-8 encoding. Maybe next year.
> > > > > > > Yes, wide **multicharacter** literals, that's was important information missing, thanks for spotting that.
> > > > > > > 
> > > > > > > I have mixed feeling about adding tests for escape sequences.  Their replacement doesn't happen during constant evaluation.
> > > > > > > We shouldn't replicate the lexing tests here.
> > > > > > > 
> > > > > > > but we should compare string literal with byte values. Testing a string literal against another one doesn't ensure the code units are correct if both are equally miss evaluated.
> > > > > > > 
> > > > > > > Also we could add explicit tests for null termination here as they are added as part of evaluation in theory - but then again that's also something clang does earlier.
> > > > > > > 
> > > > > > > If we want we could consider enabling the byte code interpreter on the existing lexing test files, i actually think that's the better way to deal with the escape sequences tests.
> > > > > > I changed the first test that inspects all characters of a string to comparing with integers instead. Do you have a suggestion for what lexing tests to enable the constant interpreter in?
> > > > > I think good candidates are
> > > > > 
> > > > > Lexer/char-escapes.c
> > > > > Lexer/char-escapes-delimited.c
> > > > > Lexer/char-literal.cpp
> > > > Of those, only `Lexer/char-escapes.c` does much validation of literal values. I prefer the approach Timm has already taken relative to those tests.
> > > > 
> > > > It looks like we don't have an existing set of Sema tests for character and string literals. How about we move this test under `clang/test/Sema`? That would be the appropriate place to exercise values relative to `-fexec-charset` support for non-UTF-8 encodings in the future. If that sounds amenable, then I'd like the test split to exercise character and string literals separately.
> > > > 
> > > > The character literal tests don't really belong in a test named `arrays.cpp` as is.
> > > > Of those, only Lexer/char-escapes.c does much validation of literal values. I prefer the approach Timm has already taken relative to those tests.
> > > 
> > > We can do both, I was not arguing against the test we have here, I'm happy with those :)
> > > I'm opposed to duplicate tests for escape sequences here.  using the new interpreter on tests that already exist (in addition of the existing run commands) is pretty easy and cheap to do.
> > > 
> > > I don't have opinions how the new interpreter tests are organized.
> > > Ideally we would have a complete set of test that is equally suitable for both constexpr engines, but maybe that's something that does not need to be done as part of this PR 
> > I've added a new run line to `test/Lexer/char-escapes.c`, which works fine. To summarize, the plan is to add a new test to `clang/test/Sema`? Or split the char tests out from `arrays.cpp` and add some for multicharacter char sequences?
> I'm a bit uncertain regarding the purpose of tests in the `Interp` directory as opposed to tests under the `Sema*` directories. Basically, I'm unsure why we wouldn't want just one set of tests that exercise the semantics of the language, at least with regard to constant evaluation. If the interpreter is also intended to (eventually) interpret full programs (e.g., evaluating `main()`), I think such tests would be appropriate for an `Interp` directory. But I say that having not closely followed the development and previous reviews of the new interpreter.
> 
> At a minimum, I would like to see the character and string literal testing pulled out of `arrays.cpp`. For the rest, I'll defer to others that have been more involved in these reviews.
> I'm a bit uncertain regarding the purpose of tests in the Interp directory as opposed to tests under the Sema* directories. Basically, I'm unsure why we wouldn't want just one set of tests that exercise the semantics of the language, at least with regard to constant evaluation. 

The line is a bit blurry; there's certainly semantic tests happening for the constant expression interpreter. I think where I draw the line is on what the goal of the test is. If the goal is to test new interpreter behavior, it goes in Interp. If the goal is to test semantic behavior (and happens to test the new interpreter as well), then Sema seems more reasonable.

Ultimately, the really important bit is getting the tests *at all*, so thank you both for your advice on this!

> At a minimum, I would like to see the character and string literal testing pulled out of arrays.cpp.

I agree, it'd be nice to put them into a file more focused on literals. It's fine to have some string literal and character literal testing in arrays (a string is an array of characters, after all), but it shouldn't be the primary place where that coverage lives.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135366/new/

https://reviews.llvm.org/D135366