<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Feb 4, 2021, at 9:17 AM, MLJ1991 via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Helvetica Neue";" class="">Hey guys,</div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Helvetica Neue"; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Helvetica Neue";" class="">I'm working on a proposal to ISO WG14 for C2x to add support for char16_t and char32_t string and character specifiers for the printf and scanf family of functions.<br class="">
<br class="">
In working on this proposal for WG14, I came across an issue in Clang.<br class="">
<br class="">
the problem is that C11 defines char16_t and char32_t as typedefs in uchar.h; while C++11 defines them as new builtin types.<br class="">
<br class="">
The real problem with this difference, is that the printf and scanf string checking code in Sema is built with the implicit assumption that any string type passed to a printf or scanf family function will be a built in type.<br class="">
<br class="">
for example, isAnyCharacterType() in the class clang::Type, only checks builtin types.<br class="">
<br class="">
so when C code is compiled, char16_t and char32_t are not seen as valid string types and Clang errors out.<br class="">
<br class="">
So to fix this mess, I've been implementing a new function called isTypedefCharacterType(ASTContext &AST) in clang:Type which will use getLangOpts() in ASTContext to check the language mode, and if we're compiling in C mode, it will desugar the type all the way down until it finds a typedef for char16_t or char32_t.<br class="">
<br class="">
But I still have a few questions for the community.<br class="">
<br class="">
1: How should wchar_t be treated when it's been disabled as a builtin type?<br class="">
<br class="">
2: isAnyCharacterType() as a name does not reflect what it actually does, it only checks builtin types, I've renamed this to isBuiltinCharacterType() in my fork, there are 17 instances throughout the LLVM codebase of "isAnyCharacterType", is it ok to change this name, and if so is my choice of isBuiltinCharacterType acceptable?</div></div></div></blockquote><div><br class=""></div><div>IMO you should keep it as a single function, but change the signature to `isAnyCharacterType(LangOptions &LangOpts)` — I would imagine most callers of `isAnyCharacterType` have the same expectations you did about the meaning of that function, and since you have recognized that the answer is language dependent, I imagine you would be doing people a favor/fixing bugs by forcing them to specify the language.</div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Helvetica Neue"; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Helvetica Neue";" class="">3: Any other comments o concerns?</div></div>_______________________________________________<br class="">cfe-dev mailing list<br class=""><a href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a><br class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev<br class=""></div></blockquote></div><br class=""></body></html>