[PATCH] D93031: Enable fexec-charset option

Tom Honermann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sun Feb 28 20:50:52 PST 2021


tahonermann added a comment.

Hi, Abhina.  Sorry for the delay getting back to you.  I added some more comments.



================
Comment at: clang/include/clang/Lex/LiteralSupport.h:191
+                    Preprocessor &PP, tok::TokenKind kind,
+                    ConversionState TranslationState = TranslateToExecCharset);
 
----------------
Is the default argument for `TranslationState` actually used anywhere?  I'm skeptical that a default argument provides a benefit here.

Actually, this diff doesn't include any changes to construct a `CharLiteralParser` with an explicit argument.  It seems this argument isn't actually needed.

The only places I see objects of `CharLiteralParser` type constructed are in `EvaluateValue()` in `clang/lib/Lex/PPExpressions.cpp` and `Sema::ActOnCharacterConstant()` in `clang/lib/Sema/SemaExpr.cpp`.


================
Comment at: clang/include/clang/Lex/LiteralSupport.h:238-239
+        ResultPtr(ResultBuf.data()), hadError(false), Pascal(false) {
+    LT = new LiteralTranslator();
+    LT->setTranslationTables(Features, Target, *Diags);
+    init(StringToks, TranslationState);
----------------
I don't think a `LiteralTranslator` object is actually needed in this case.  The only use of this constructor that I see is in `ModuleMapParser::consumeToken()` in `clang/lib/Lex/ModuleMap.cpp` and, in that case, I don't think any translation is necessary.

This suggests that `TranslationState` is not needed for this constructor either; `NoTranslation` can be passed to `init()`.


================
Comment at: clang/include/clang/Lex/LiteralSupport.h:260-262
+  unsigned getOffsetOfStringByte(
+      const Token &TheTok, unsigned ByteNo,
+      ConversionState TranslationState = TranslateToExecCharset) const;
----------------
I don't think `getOffsetOfStringByte()` should require a `ConversionState` parameter.  If I understand it correctly, this function should be operating on the string in the internal encoding, never in a converted encoding.


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:19-23
+enum ConversionState {
+  NoTranslation,
+  TranslateToSystemCharset,
+  TranslateToExecCharset
+};
----------------
Some naming suggestions...  The enumeration is not used to record a state, but rather to indicate an action to take.  Also, use of both "conversion" and "translation" could be confusing, so I suggest sticking with one.  Perhaps:
  enum class LiteralConversion {
    None,
    ToSystemCharset,
    ToExecCharset
  };


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:30-31
+
+class LiteralTranslator {
+public:
+  llvm::StringRef InternalCharset;
----------------
I don't know the LLVM style guides well, but I suspect a class with all public members should be defined using `struct` and not include access specifiers.


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:35
+  llvm::StringRef ExecCharset;
+  llvm::StringMap<llvm::CharSetConverter> ExecCharsetTables;
+
----------------
Given the converter setters and accessors below, `ExecCharsetTables` should be a private member.


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:37
+
+  llvm::CharSetConverter *getConversionTable(const char *Codepage);
+  CharsetTableStatusCode findOrCreateExecCharsetTable(const char *To);
----------------
`getConversionTable()` is logically `const`.  Perhaps `ExecCharsetTables` should be `mutable`.

>From a terminology stand point, this function is misnamed.  It doesn't return a table, it returns a converter for an encoding.  I suggest:
  llvm::CharSetConverter *getCharSetConverter(const char *Encoding) const;


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:38
+  llvm::CharSetConverter *getConversionTable(const char *Codepage);
+  CharsetTableStatusCode findOrCreateExecCharsetTable(const char *To);
+  llvm::CharSetConverter *
----------------
`findOrCreateExecCharsetTable()` seems oddly named since it doesn't return whatever it finds or creates.  It seems like this function would be more useful if it returned a `llvm::CharSetConverter` pointer with `nullptr` indicating lookup/creation failed.

This function seems like it should be an implementation detail of the class, not a public interface.


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:41-43
+  void setTranslationTables(const clang::LangOptions &Opts,
+                            const clang::TargetInfo &TInfo,
+                            clang::DiagnosticsEngine &Diags);
----------------
`setTranslationTables()` is awkward.  It is effectively operating as a constructor for the class, but isn't called at object construction and it does work that goes beyond initialization.


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:44
+                            clang::DiagnosticsEngine &Diags);
+};
+
----------------
I suggest trying a design more like this:
  class LiteralTranslator {
    std::string SystemEncoding;
    std::string ExecutionEncoding;
  public:
    LiteralTranslator(llvm::StringRef SystemEncoding,
                      llvm::StringRef ExecutionEncoding);

    // Retrieve the name for the system encoding.
    llvm::StringRef getSystemEncoding() const;
    // Retrieve the name for the execution encoding.
    llvm::StringRef getExecutionEncoding() const;

    // Retrieve a converter for converting from the internal encoding (UTF-8)
    // to the system encoding.
    llvm::CharSetConverter* getSystemEncodingConverter() const;
    // Retrieve a converter for converting from the internal encoding (UTF-8)
    // to the execution encoding.
    llvm::CharSetConverter* getExecutionEncodingConverter() const;
  };
  
  LiteralTranslator createLiteralTranslatorFromOptions(const clang::LangOptions &Opts,
                                                       const clang::TargetInfo &TInfo,
                                                       clang::DiagnosticsEngine &Diags);


================
Comment at: clang/include/clang/Lex/Preprocessor.h:145
   ModuleLoader      &TheModuleLoader;
+  LiteralTranslator *LT = nullptr;
 
----------------
I don't see a reason for `LT` to be a pointer.  Can it be made a reference or, better, a non-reference, non-pointer data member?


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1322-1324
+  ConversionState State = isUTFLiteral(Kind) ? NoTranslation : TranslationState;
+  llvm::CharSetConverter *Converter =
+      LT ? LT->getCharConversionTable(State) : nullptr;
----------------
Per the comment associated with the constructor declaration, I don't think the new constructor parameter is needed; translation to execution character set is always desired for non-UTF character literals.  I think this can be something like:
  llvm::CharSetConverter *Converter = nullptr;
  if (! isUTFLiteral(Kind)) {
    assert(LT);
    Converter = LT->getCharConversionTable(TranslateToExecCharset);
  }


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1593-1597
+  ConversionState State = TranslationState;
+  if (Kind == tok::wide_string_literal)
+    State = TranslateToSystemCharset;
+  else if (isUTFLiteral(Kind))
+    State = NoTranslation;
----------------
abhina.sreeskantharajan wrote:
> tahonermann wrote:
> > The stored `TranslationState` should not be completely ignored for wide and UTF string literals.  The standard permits things like the following.
> >   #pragma rigoot L"bozit"
> >   #pragma rigoot u"bozit"
> >   _Pragma(L"rigoot bozit")
> >   _Pragma(u8"rigoot bozit")
> > For at least the `_Pragma(L"...")` case, the C++ standard [[ http://eel.is/c++draft/cpp.pragma.op | states ]] the `L` is ignored, but it doesn't say anything about other encoding prefixes.
> Please correct me if I'm wrong, these Pragma strings are not parsed through StringLiteralParser, they are parsed in **clang/lib/Lex/Pragma.cpp** in this function.
> **void Preprocessor::Handle_Pragma(Token &Tok)**
> So if they require translation, it would need to be done in that function.
> 
Ah, ok, good.  There are other cases where a string literal is not used to produce a string literal object.  See https://wg21.link/p2314 for a table.  You may want to audit for those cases.


================
Comment at: clang/lib/Lex/Preprocessor.cpp:88
       ScratchBuf(new ScratchBuffer(SourceMgr)), HeaderInfo(Headers),
-      TheModuleLoader(TheModuleLoader), ExternalSource(nullptr),
+      TheModuleLoader(TheModuleLoader), LT(new LiteralTranslator()),
+      ExternalSource(nullptr),
----------------
Per comments elsewhere, please try to make `LT` a non-pointer non-reference data member.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93031/new/

https://reviews.llvm.org/D93031



More information about the cfe-commits mailing list