[LLVMdev] LLVM supports Unicode?
bagel
bagel99 at gmail.com
Sun Aug 28 18:00:15 PDT 2011
I think a very related question is "Does LLVM support UTF-8? The answer has
two parts:
1. As strings (arrays of bytes) - yes
2. As identifiers - no
The fix to the second part depends partly on the object file formats. But to
at least accept UTF-8 as identifiers, the following patch helps. (I know that
it does not descriminate between valid and in-valid UTF-8.)
--- lib/AsmParser/LLLexer.cpp (revision 138730)
+++ lib/AsmParser/LLLexer.cpp (working copy)
@@ -348,10 +348,10 @@
bool LLLexer::ReadVarName() {
const char *NameStart = CurPtr;
if (isalpha(CurPtr[0]) || CurPtr[0] == '-' || CurPtr[0] == '$' ||
- CurPtr[0] == '.' || CurPtr[0] == '_') {
+ CurPtr[0] == '.' || CurPtr[0] == '_' || (CurPtr[0]&0x80) != 0) {
++CurPtr;
while (isalnum(CurPtr[0]) || CurPtr[0] == '-' || CurPtr[0] == '$' ||
- CurPtr[0] == '.' || CurPtr[0] == '_')
+ CurPtr[0] == '.' || CurPtr[0] == '_' || (CurPtr[0]&0x80) != 0)
++CurPtr;
StrVal.assign(NameStart, CurPtr);
More information about the llvm-dev
mailing list