[PATCH] D93031: Enable fexec-charset option
Tom Honermann via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Sun Feb 28 20:50:52 PST 2021
tahonermann added a comment.
Hi, Abhina. Sorry for the delay getting back to you. I added some more comments.
================
Comment at: clang/include/clang/Lex/LiteralSupport.h:191
+ Preprocessor &PP, tok::TokenKind kind,
+ ConversionState TranslationState = TranslateToExecCharset);
----------------
Is the default argument for `TranslationState` actually used anywhere? I'm skeptical that a default argument provides a benefit here.
Actually, this diff doesn't include any changes to construct a `CharLiteralParser` with an explicit argument. It seems this argument isn't actually needed.
The only places I see objects of `CharLiteralParser` type constructed are in `EvaluateValue()` in `clang/lib/Lex/PPExpressions.cpp` and `Sema::ActOnCharacterConstant()` in `clang/lib/Sema/SemaExpr.cpp`.
================
Comment at: clang/include/clang/Lex/LiteralSupport.h:238-239
+ ResultPtr(ResultBuf.data()), hadError(false), Pascal(false) {
+ LT = new LiteralTranslator();
+ LT->setTranslationTables(Features, Target, *Diags);
+ init(StringToks, TranslationState);
----------------
I don't think a `LiteralTranslator` object is actually needed in this case. The only use of this constructor that I see is in `ModuleMapParser::consumeToken()` in `clang/lib/Lex/ModuleMap.cpp` and, in that case, I don't think any translation is necessary.
This suggests that `TranslationState` is not needed for this constructor either; `NoTranslation` can be passed to `init()`.
================
Comment at: clang/include/clang/Lex/LiteralSupport.h:260-262
+ unsigned getOffsetOfStringByte(
+ const Token &TheTok, unsigned ByteNo,
+ ConversionState TranslationState = TranslateToExecCharset) const;
----------------
I don't think `getOffsetOfStringByte()` should require a `ConversionState` parameter. If I understand it correctly, this function should be operating on the string in the internal encoding, never in a converted encoding.
================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:19-23
+enum ConversionState {
+ NoTranslation,
+ TranslateToSystemCharset,
+ TranslateToExecCharset
+};
----------------
Some naming suggestions... The enumeration is not used to record a state, but rather to indicate an action to take. Also, use of both "conversion" and "translation" could be confusing, so I suggest sticking with one. Perhaps:
enum class LiteralConversion {
None,
ToSystemCharset,
ToExecCharset
};
================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:30-31
+
+class LiteralTranslator {
+public:
+ llvm::StringRef InternalCharset;
----------------
I don't know the LLVM style guides well, but I suspect a class with all public members should be defined using `struct` and not include access specifiers.
================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:35
+ llvm::StringRef ExecCharset;
+ llvm::StringMap<llvm::CharSetConverter> ExecCharsetTables;
+
----------------
Given the converter setters and accessors below, `ExecCharsetTables` should be a private member.
================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:37
+
+ llvm::CharSetConverter *getConversionTable(const char *Codepage);
+ CharsetTableStatusCode findOrCreateExecCharsetTable(const char *To);
----------------
`getConversionTable()` is logically `const`. Perhaps `ExecCharsetTables` should be `mutable`.
>From a terminology stand point, this function is misnamed. It doesn't return a table, it returns a converter for an encoding. I suggest:
llvm::CharSetConverter *getCharSetConverter(const char *Encoding) const;
================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:38
+ llvm::CharSetConverter *getConversionTable(const char *Codepage);
+ CharsetTableStatusCode findOrCreateExecCharsetTable(const char *To);
+ llvm::CharSetConverter *
----------------
`findOrCreateExecCharsetTable()` seems oddly named since it doesn't return whatever it finds or creates. It seems like this function would be more useful if it returned a `llvm::CharSetConverter` pointer with `nullptr` indicating lookup/creation failed.
This function seems like it should be an implementation detail of the class, not a public interface.
================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:41-43
+ void setTranslationTables(const clang::LangOptions &Opts,
+ const clang::TargetInfo &TInfo,
+ clang::DiagnosticsEngine &Diags);
----------------
`setTranslationTables()` is awkward. It is effectively operating as a constructor for the class, but isn't called at object construction and it does work that goes beyond initialization.
================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:44
+ clang::DiagnosticsEngine &Diags);
+};
+
----------------
I suggest trying a design more like this:
class LiteralTranslator {
std::string SystemEncoding;
std::string ExecutionEncoding;
public:
LiteralTranslator(llvm::StringRef SystemEncoding,
llvm::StringRef ExecutionEncoding);
// Retrieve the name for the system encoding.
llvm::StringRef getSystemEncoding() const;
// Retrieve the name for the execution encoding.
llvm::StringRef getExecutionEncoding() const;
// Retrieve a converter for converting from the internal encoding (UTF-8)
// to the system encoding.
llvm::CharSetConverter* getSystemEncodingConverter() const;
// Retrieve a converter for converting from the internal encoding (UTF-8)
// to the execution encoding.
llvm::CharSetConverter* getExecutionEncodingConverter() const;
};
LiteralTranslator createLiteralTranslatorFromOptions(const clang::LangOptions &Opts,
const clang::TargetInfo &TInfo,
clang::DiagnosticsEngine &Diags);
================
Comment at: clang/include/clang/Lex/Preprocessor.h:145
ModuleLoader &TheModuleLoader;
+ LiteralTranslator *LT = nullptr;
----------------
I don't see a reason for `LT` to be a pointer. Can it be made a reference or, better, a non-reference, non-pointer data member?
================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1322-1324
+ ConversionState State = isUTFLiteral(Kind) ? NoTranslation : TranslationState;
+ llvm::CharSetConverter *Converter =
+ LT ? LT->getCharConversionTable(State) : nullptr;
----------------
Per the comment associated with the constructor declaration, I don't think the new constructor parameter is needed; translation to execution character set is always desired for non-UTF character literals. I think this can be something like:
llvm::CharSetConverter *Converter = nullptr;
if (! isUTFLiteral(Kind)) {
assert(LT);
Converter = LT->getCharConversionTable(TranslateToExecCharset);
}
================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1593-1597
+ ConversionState State = TranslationState;
+ if (Kind == tok::wide_string_literal)
+ State = TranslateToSystemCharset;
+ else if (isUTFLiteral(Kind))
+ State = NoTranslation;
----------------
abhina.sreeskantharajan wrote:
> tahonermann wrote:
> > The stored `TranslationState` should not be completely ignored for wide and UTF string literals. The standard permits things like the following.
> > #pragma rigoot L"bozit"
> > #pragma rigoot u"bozit"
> > _Pragma(L"rigoot bozit")
> > _Pragma(u8"rigoot bozit")
> > For at least the `_Pragma(L"...")` case, the C++ standard [[ http://eel.is/c++draft/cpp.pragma.op | states ]] the `L` is ignored, but it doesn't say anything about other encoding prefixes.
> Please correct me if I'm wrong, these Pragma strings are not parsed through StringLiteralParser, they are parsed in **clang/lib/Lex/Pragma.cpp** in this function.
> **void Preprocessor::Handle_Pragma(Token &Tok)**
> So if they require translation, it would need to be done in that function.
>
Ah, ok, good. There are other cases where a string literal is not used to produce a string literal object. See https://wg21.link/p2314 for a table. You may want to audit for those cases.
================
Comment at: clang/lib/Lex/Preprocessor.cpp:88
ScratchBuf(new ScratchBuffer(SourceMgr)), HeaderInfo(Headers),
- TheModuleLoader(TheModuleLoader), ExternalSource(nullptr),
+ TheModuleLoader(TheModuleLoader), LT(new LiteralTranslator()),
+ ExternalSource(nullptr),
----------------
Per comments elsewhere, please try to make `LT` a non-pointer non-reference data member.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D93031/new/
https://reviews.llvm.org/D93031
More information about the cfe-commits
mailing list