[PATCH] D93031: Enable fexec-charset option

Wed Dec 30 06:52:03 PST 2020

abhina.sreeskantharajan marked 2 inline comments as done.
abhina.sreeskantharajan added inline comments.

================
Comment at: clang/lib/Lex/LiteralSupport.cpp:235
+    Converter->convert(std::string(1, ByteChar), ResultCharConv);
+    memcpy((void *)&ResultChar, ResultCharConv.data(), sizeof(unsigned));
+  }
----------------
tahonermann wrote:
> As Richard previously noted, this `memcpy()` needs to be addressed.  The intended behavior here is not clear.  Are there valid scenarios in which the conversion will produce a sequence of more than one code units?  I believe the input is limited to ASCII characters and invalid code units (e.g., a lead byte of a UTF-8 sequence) and in the latter case, an error and/or substitution of a `?` (in the execution encoding) seem like acceptable behaviors to me.
I replaced memcpy with an assignment. Please let me know if there is a better solution. 

================
Comment at: clang/lib/Lex/LiteralSupport.cpp:238
+      void *Pointer = &ResultChar;
+      memcpy(Pointer, ResultCharConv.data(), sizeof(unsigned));
+    }
----------------
abhina.sreeskantharajan wrote:
> rsmith wrote:
> > What should happen if the result doesn't fit into an `unsigned`? This also appears to be making problematic assumptions about the endianness of the host. If we really want to pack multiple bytes of encoded output into a single `unsigned` result value (which itself seems dubious), we should do so with an endianness that doesn't depend on the host.
> This may be a problem we need to revisit since ResultChar is expecting a char.
I added an assertion for this case where the size of the character increases after translation. I've also removed the memcpy to avoid endianness issues.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93031/new/

https://reviews.llvm.org/D93031