[PATCH] D93031: Enable fexec-charset option
Abhina Sree via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 30 06:52:03 PST 2020
abhina.sreeskantharajan marked 2 inline comments as done.
abhina.sreeskantharajan added inline comments.
================
Comment at: clang/lib/Lex/LiteralSupport.cpp:235
+ Converter->convert(std::string(1, ByteChar), ResultCharConv);
+ memcpy((void *)&ResultChar, ResultCharConv.data(), sizeof(unsigned));
+ }
----------------
tahonermann wrote:
> As Richard previously noted, this `memcpy()` needs to be addressed. The intended behavior here is not clear. Are there valid scenarios in which the conversion will produce a sequence of more than one code units? I believe the input is limited to ASCII characters and invalid code units (e.g., a lead byte of a UTF-8 sequence) and in the latter case, an error and/or substitution of a `?` (in the execution encoding) seem like acceptable behaviors to me.
I replaced memcpy with an assignment. Please let me know if there is a better solution.
================
Comment at: clang/lib/Lex/LiteralSupport.cpp:238
+ void *Pointer = &ResultChar;
+ memcpy(Pointer, ResultCharConv.data(), sizeof(unsigned));
+ }
----------------
abhina.sreeskantharajan wrote:
> rsmith wrote:
> > What should happen if the result doesn't fit into an `unsigned`? This also appears to be making problematic assumptions about the endianness of the host. If we really want to pack multiple bytes of encoded output into a single `unsigned` result value (which itself seems dubious), we should do so with an endianness that doesn't depend on the host.
> This may be a problem we need to revisit since ResultChar is expecting a char.
I added an assertion for this case where the size of the character increases after translation. I've also removed the memcpy to avoid endianness issues.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D93031/new/
https://reviews.llvm.org/D93031
More information about the llvm-commits
mailing list