[PATCH] D93031: Enable fexec-charset option

Tue Dec 29 11:39:35 PST 2020

abhina.sreeskantharajan marked 8 inline comments as done.
abhina.sreeskantharajan added inline comments.

================
Comment at: clang/include/clang/Lex/LiteralSupport.h:244
   bool Pascal;
+  ConversionState TranslationState;
+
----------------
tahonermann wrote:
> abhina.sreeskantharajan wrote:
> > tahonermann wrote:
> > > Same concern here with respect to persisting the conversion state as a data member.
> > If this member is removed in StringLiteralParser, we will need to pass the State to multiple functions in StringLiteralParser like init(). Would this solution be preferable to keeping a data member?
> I think so, yes.  Data members should be used to reflect the state of the object, not as a convenient mechanism to avoid passing arguments.
Thanks, I've removed this member.

================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1322-1323
+  TranslationState = translationState;
+  if (Kind == tok::wide_string_literal)
+    TranslationState = TranslateToSystemCharset;
+  else if (isUTFLiteral(Kind))
----------------
tahonermann wrote:
> abhina.sreeskantharajan wrote:
> > tahonermann wrote:
> > > Converting wide character literals to the system encoding doesn't seem right to me.  For z/OS, this should presumably convert to the wide EBCDIC encoding, but for all other supported platforms, the wide execution character set is either UTF-16 or UTF-32 depending on the size of `wchar_t` (which may be influenced by the `-fshort-wchar` option).
> > Since we don't implement -fwide-exec-charset yet, what do you think should be the default behaviour for the interim?
> Perhaps an Internal compiler error to indicate that appropriate support is not yet in place?
Thanks for the suggestion. I've added assertions for wide character translation before we do any translation.

================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1594-1595
+  ConversionState State = TranslationState;
+  if (Kind == tok::wide_string_literal)
+    State = TranslateToSystemCharset;
+  else if (isUTFLiteral(Kind))
----------------
tahonermann wrote:
> Converting wide string literals to the system encoding doesn't seem right to me.  For z/OS, this should presumably convert to the wide EBCDIC encoding, but for all other supported platforms, the wide execution character set is either UTF-16 or UTF-32 depending on the size of `wchar_t` (which may be influenced by the `-fshort-wchar` option).
I've now added an assertion when translating wide characters.

================
Comment at: clang/test/CodeGen/systemz-charset.cpp:1-25
+// RUN: %clang %s -std=c++17 -emit-llvm -S -target s390x-ibm-zos -o - | FileCheck %s
+
+const char *RawString = R"(Hello\n)";
+//CHECK: c"\C8\85\93\93\96\E0\95\00"
+
+char UnicodeChar8 = u8'1';
+//CHECK: i8 49
----------------
tahonermann wrote:
> This is good.  I suggest adding escape sequences and UCNs to validate that they are not converted to IBM-1047.
Good idea, I added those testcases as per your suggestion.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93031/new/

https://reviews.llvm.org/D93031